arm64: fix pud_huge() for 2-level pagetables

Message ID	1400163562-7481-1-git-send-email-msalter@redhat.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org> From: Mark Salter <msalter@redhat.com> To: Catalin Marinas <catalin.marinas@arm.com>, Will Deacon <will.deacon@arm.com> Subject: [PATCH] arm64: fix pud_huge() for 2-level pagetables Date: Thu, 15 May 2014 10:19:22 -0400 Message-Id: <1400163562-7481-1-git-send-email-msalter@redhat.com> Cc: Steve Capper <steve.capper@linaro.org>, linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, Mark Salter <msalter@redhat.com> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" <linux-arm-kernel-bounces@lists.infradead.org> Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org

Mark Salter May 15, 2014, 2:19 p.m. UTC

The following happens when trying to run a kvm guest on a kernel
configured for 64k pages. This doesn't happen with 4k pages:

  BUG: failure at include/linux/mm.h:297/put_page_testzero()!
  Kernel panic - not syncing: BUG!
  CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
  Call trace:
  [<fffffe0000096034>] dump_backtrace+0x0/0x16c
  [<fffffe00000961b4>] show_stack+0x14/0x1c
  [<fffffe000066e648>] dump_stack+0x84/0xb0
  [<fffffe0000668678>] panic+0xf4/0x220
  [<fffffe000018ec78>] free_reserved_area+0x0/0x110
  [<fffffe000018edd8>] free_pages+0x50/0x88
  [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
  [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
  [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
  [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
  [<fffffe00001edc1c>] __fput+0xb0/0x288
  [<fffffe00001ede4c>] ____fput+0xc/0x14
  [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
  [<fffffe0000095c14>] do_notify_resume+0x54/0x58

In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
on the stage2 pgd which leads to the BUG in put_page_testzero(). This
happens because a pud_huge() test in unmap_range() returns true when it
should always be false with 2-level pages tables used by 64k pages.
This patch removes support for huge puds if 2-level pagetables are
being used.

Signed-off-by: Mark Salter <msalter@redhat.com>
---
 arch/arm64/mm/hugetlbpage.c | 6 ++++++
 1 file changed, 6 insertions(+)

Steve Capper May 15, 2014, 2:44 p.m. UTC | #1

On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
> The following happens when trying to run a kvm guest on a kernel
> configured for 64k pages. This doesn't happen with 4k pages:
> 
>   BUG: failure at include/linux/mm.h:297/put_page_testzero()!
>   Kernel panic - not syncing: BUG!
>   CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
>   Call trace:
>   [<fffffe0000096034>] dump_backtrace+0x0/0x16c
>   [<fffffe00000961b4>] show_stack+0x14/0x1c
>   [<fffffe000066e648>] dump_stack+0x84/0xb0
>   [<fffffe0000668678>] panic+0xf4/0x220
>   [<fffffe000018ec78>] free_reserved_area+0x0/0x110
>   [<fffffe000018edd8>] free_pages+0x50/0x88
>   [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
>   [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
>   [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
>   [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
>   [<fffffe00001edc1c>] __fput+0xb0/0x288
>   [<fffffe00001ede4c>] ____fput+0xc/0x14
>   [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
>   [<fffffe0000095c14>] do_notify_resume+0x54/0x58
> 
> In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
> on the stage2 pgd which leads to the BUG in put_page_testzero(). This
> happens because a pud_huge() test in unmap_range() returns true when it
> should always be false with 2-level pages tables used by 64k pages.
> This patch removes support for huge puds if 2-level pagetables are
> being used.

Hi Mark,
I'm still catching up with myself, sorry  (was off sick for a couple
of days)...

I thought unmap_range was going to be changed?
Does the following help things?
https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html

Cheers,

Mark Salter May 15, 2014, 4:27 p.m. UTC | #2

On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote:
> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
> > The following happens when trying to run a kvm guest on a kernel
> > configured for 64k pages. This doesn't happen with 4k pages:
> > 
> >   BUG: failure at include/linux/mm.h:297/put_page_testzero()!
> >   Kernel panic - not syncing: BUG!
> >   CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
> >   Call trace:
> >   [<fffffe0000096034>] dump_backtrace+0x0/0x16c
> >   [<fffffe00000961b4>] show_stack+0x14/0x1c
> >   [<fffffe000066e648>] dump_stack+0x84/0xb0
> >   [<fffffe0000668678>] panic+0xf4/0x220
> >   [<fffffe000018ec78>] free_reserved_area+0x0/0x110
> >   [<fffffe000018edd8>] free_pages+0x50/0x88
> >   [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
> >   [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
> >   [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
> >   [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
> >   [<fffffe00001edc1c>] __fput+0xb0/0x288
> >   [<fffffe00001ede4c>] ____fput+0xc/0x14
> >   [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
> >   [<fffffe0000095c14>] do_notify_resume+0x54/0x58
> > 
> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This
> > happens because a pud_huge() test in unmap_range() returns true when it
> > should always be false with 2-level pages tables used by 64k pages.
> > This patch removes support for huge puds if 2-level pagetables are
> > being used.
> 
> Hi Mark,
> I'm still catching up with myself, sorry  (was off sick for a couple
> of days)...
> 
> I thought unmap_range was going to be changed?
> Does the following help things?
> https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html

No, I get the same BUG. Regardless, pud_huge() should always return
false for 2-level pagetables, right?

Steve Capper May 15, 2014, 5:55 p.m. UTC | #3

On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote:
> On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote:
>> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
>> > The following happens when trying to run a kvm guest on a kernel
>> > configured for 64k pages. This doesn't happen with 4k pages:
>> >
>> >   BUG: failure at include/linux/mm.h:297/put_page_testzero()!
>> >   Kernel panic - not syncing: BUG!
>> >   CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
>> >   Call trace:
>> >   [<fffffe0000096034>] dump_backtrace+0x0/0x16c
>> >   [<fffffe00000961b4>] show_stack+0x14/0x1c
>> >   [<fffffe000066e648>] dump_stack+0x84/0xb0
>> >   [<fffffe0000668678>] panic+0xf4/0x220
>> >   [<fffffe000018ec78>] free_reserved_area+0x0/0x110
>> >   [<fffffe000018edd8>] free_pages+0x50/0x88
>> >   [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
>> >   [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
>> >   [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
>> >   [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
>> >   [<fffffe00001edc1c>] __fput+0xb0/0x288
>> >   [<fffffe00001ede4c>] ____fput+0xc/0x14
>> >   [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
>> >   [<fffffe0000095c14>] do_notify_resume+0x54/0x58
>> >
>> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
>> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This
>> > happens because a pud_huge() test in unmap_range() returns true when it
>> > should always be false with 2-level pages tables used by 64k pages.
>> > This patch removes support for huge puds if 2-level pagetables are
>> > being used.
>>
>> Hi Mark,
>> I'm still catching up with myself, sorry  (was off sick for a couple
>> of days)...
>>
>> I thought unmap_range was going to be changed?
>> Does the following help things?
>> https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html
>
> No, I get the same BUG. Regardless, pud_huge() should always return
> false for 2-level pagetables, right?

Okay, thanks for giving that a go.

Yeah I agree for 64K granule it doesn't make sense to have a huge_pud.
The patch looks sound now, but checking for a folded pmd may run into
problems if/when we get to 3-levels and 64K pages in future.

Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a
bit more robust?

Cheers,

Mark Salter May 15, 2014, 6:39 p.m. UTC | #4

On Thu, 2014-05-15 at 18:55 +0100, Steve Capper wrote:
> On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote:
> > On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote:
> >> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
> >> > The following happens when trying to run a kvm guest on a kernel
> >> > configured for 64k pages. This doesn't happen with 4k pages:
> >> >
> >> >   BUG: failure at include/linux/mm.h:297/put_page_testzero()!
> >> >   Kernel panic - not syncing: BUG!
> >> >   CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
> >> >   Call trace:
> >> >   [<fffffe0000096034>] dump_backtrace+0x0/0x16c
> >> >   [<fffffe00000961b4>] show_stack+0x14/0x1c
> >> >   [<fffffe000066e648>] dump_stack+0x84/0xb0
> >> >   [<fffffe0000668678>] panic+0xf4/0x220
> >> >   [<fffffe000018ec78>] free_reserved_area+0x0/0x110
> >> >   [<fffffe000018edd8>] free_pages+0x50/0x88
> >> >   [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
> >> >   [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
> >> >   [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
> >> >   [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
> >> >   [<fffffe00001edc1c>] __fput+0xb0/0x288
> >> >   [<fffffe00001ede4c>] ____fput+0xc/0x14
> >> >   [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
> >> >   [<fffffe0000095c14>] do_notify_resume+0x54/0x58
> >> >
> >> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
> >> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This
> >> > happens because a pud_huge() test in unmap_range() returns true when it
> >> > should always be false with 2-level pages tables used by 64k pages.
> >> > This patch removes support for huge puds if 2-level pagetables are
> >> > being used.
> >>
> >> Hi Mark,
> >> I'm still catching up with myself, sorry  (was off sick for a couple
> >> of days)...
> >>
> >> I thought unmap_range was going to be changed?
> >> Does the following help things?
> >> https://lists.cs.columbia.edu/pipermail/kvmarm/2014-May/009388.html
> >
> > No, I get the same BUG. Regardless, pud_huge() should always return
> > false for 2-level pagetables, right?
> 
> Okay, thanks for giving that a go.
> 
> Yeah I agree for 64K granule it doesn't make sense to have a huge_pud.
> The patch looks sound now, but checking for a folded pmd may run into
> problems if/when we get to 3-levels and 64K pages in future.
> 
> Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a
> bit more robust?
> 

I don't think testing based on granule size is generally correct either.
Maybe support for 3-level page tables with 64k granule gets added as an
option. That would break the pagesize based test. With a folded pmd, we
know there is no pud, so pud_huge() should always be false.

Catalin Marinas May 16, 2014, 9:51 a.m. UTC | #5

On Thu, May 15, 2014 at 07:39:17PM +0100, Mark Salter wrote:
> On Thu, 2014-05-15 at 18:55 +0100, Steve Capper wrote:
> > On 15 May 2014 17:27, Mark Salter <msalter@redhat.com> wrote:
> > > On Thu, 2014-05-15 at 15:44 +0100, Steve Capper wrote:
> > >> On Thu, May 15, 2014 at 10:19:22AM -0400, Mark Salter wrote:
> > >> > In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
> > >> > on the stage2 pgd which leads to the BUG in put_page_testzero(). This
> > >> > happens because a pud_huge() test in unmap_range() returns true when it
> > >> > should always be false with 2-level pages tables used by 64k pages.
> > >> > This patch removes support for huge puds if 2-level pagetables are
> > >> > being used.
[...]
> > Yeah I agree for 64K granule it doesn't make sense to have a huge_pud.
> > The patch looks sound now, but checking for a folded pmd may run into
> > problems if/when we get to 3-levels and 64K pages in future.
> > 
> > Perhaps checking for PAGE_SHIFT==12 (or something similar) would be a
> > bit more robust?
> 
> I don't think testing based on granule size is generally correct either.
> Maybe support for 3-level page tables with 64k granule gets added as an
> option. That would break the pagesize based test. With a folded pmd, we
> know there is no pud, so pud_huge() should always be false.

I agree, pud_huge() should be false in the same way we define
pud_present() to be 1 when __PGTABLE_PMD_FOLDED. The *_huge() macros
aren't covered by the generic headers unfortunately (some clean-up would
be useful at some point but for now this patch is fine).

Catalin Marinas May 16, 2014, 10:04 a.m. UTC | #6

On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote:
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 5e9aec3..9bed38f 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd)
>  
>  int pud_huge(pud_t pud)
>  {
> +#ifndef __PAGETABLE_PMD_FOLDED
>  	return !(pud_val(pud) & PUD_TABLE_BIT);
> +#else
> +	return 0;
> +#endif
>  }
>  
>  int pmd_huge_support(void)
> @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt)
>  	unsigned long ps = memparse(opt, &opt);
>  	if (ps == PMD_SIZE) {
>  		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> +#ifndef __PAGETABLE_PMD_FOLDED
>  	} else if (ps == PUD_SIZE) {
>  		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> +#endif

Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the
#ifndef here? Maybe the compiler is smart enough to remove it but it's
not on a critical path anyway, so I wouldn't bother.

Mark Salter May 16, 2014, 3:54 p.m. UTC | #7

On Fri, 2014-05-16 at 11:04 +0100, Catalin Marinas wrote:
> On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote:
> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > index 5e9aec3..9bed38f 100644
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd)
> >  
> >  int pud_huge(pud_t pud)
> >  {
> > +#ifndef __PAGETABLE_PMD_FOLDED
> >  	return !(pud_val(pud) & PUD_TABLE_BIT);
> > +#else
> > +	return 0;
> > +#endif
> >  }
> >  
> >  int pmd_huge_support(void)
> > @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt)
> >  	unsigned long ps = memparse(opt, &opt);
> >  	if (ps == PMD_SIZE) {
> >  		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> > +#ifndef __PAGETABLE_PMD_FOLDED
> >  	} else if (ps == PUD_SIZE) {
> >  		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> > +#endif
> 
> Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the
> #ifndef here? Maybe the compiler is smart enough to remove it but it's
> not on a critical path anyway, so I wouldn't bother.
> 

Yes, I think it would remove it. In any case, one less ifdef would be
a good thing.

Catalin Marinas May 16, 2014, 4:20 p.m. UTC | #8

On Fri, May 16, 2014 at 04:54:11PM +0100, Mark Salter wrote:
> On Fri, 2014-05-16 at 11:04 +0100, Catalin Marinas wrote:
> > On Thu, May 15, 2014 at 03:19:22PM +0100, Mark Salter wrote:
> > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > > index 5e9aec3..9bed38f 100644
> > > --- a/arch/arm64/mm/hugetlbpage.c
> > > +++ b/arch/arm64/mm/hugetlbpage.c
> > > @@ -51,7 +51,11 @@ int pmd_huge(pmd_t pmd)
> > >  
> > >  int pud_huge(pud_t pud)
> > >  {
> > > +#ifndef __PAGETABLE_PMD_FOLDED
> > >  	return !(pud_val(pud) & PUD_TABLE_BIT);
> > > +#else
> > > +	return 0;
> > > +#endif
> > >  }
> > >  
> > >  int pmd_huge_support(void)
> > > @@ -64,8 +68,10 @@ static __init int setup_hugepagesz(char *opt)
> > >  	unsigned long ps = memparse(opt, &opt);
> > >  	if (ps == PMD_SIZE) {
> > >  		hugetlb_add_hstate(PMD_SHIFT - PAGE_SHIFT);
> > > +#ifndef __PAGETABLE_PMD_FOLDED
> > >  	} else if (ps == PUD_SIZE) {
> > >  		hugetlb_add_hstate(PUD_SHIFT - PAGE_SHIFT);
> > > +#endif
> > 
> > Since PMD_SIZE == PUD_SIZE when __PAGETABLE_PMD_FOLDED, do we need the
> > #ifndef here? Maybe the compiler is smart enough to remove it but it's
> > not on a critical path anyway, so I wouldn't bother.
> 
> Yes, I think it would remove it. In any case, one less ifdef would be
> a good thing.

I merged this patch and dropped the last #ifndef.

I still have doubts about the kvm code calling put_page more than
necessary, especially since pud == pmd and the loop continues after
pud_huge() returns true, but your patch looks harmless.

Unless Steve has any objection, I'll push it to mainline. I also added
Cc: stable # v3.11+

Thanks.

arm64: fix pud_huge() for 2-level pagetables

Commit Message

Comments

Patch