Message ID | 20200901133357.52640-3-alexandru.elisei@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | KVM: arm64: user_mem_abort() improvements | expand |
Hi Alexandru, On 9/1/20 11:33 PM, Alexandru Elisei wrote: > When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to > use the same block size to map the faulting IPA in stage 2. If stage 2 > cannot use the same size mapping because the block size doesn't fit in the > memslot or the memslot is not properly aligned, user_mem_abort() will fall > back to a page mapping, regardless of the block size. We can do better for > PUD backed hugetlbfs by checking if a PMD block mapping is possible before > deciding to use a page. > > vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning > its value. > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> > --- > arch/arm64/kvm/mmu.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 25e7dc52c086..f590f7355cda 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1871,15 +1871,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > else > vma_shift = PAGE_SHIFT; > > - vma_pagesize = 1ULL << vma_shift; > if (logging_active || > - (vma->vm_flags & VM_PFNMAP) || > - !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { > + (vma->vm_flags & VM_PFNMAP)) { > force_pte = true; > - vma_pagesize = PAGE_SIZE; > vma_shift = PAGE_SHIFT; > } > It looks incorrect because @vma_pagesize wasn't initialized when it's passed to fault_supports_stage2_huge_mapping() for the checking. It's assumed you missed the following changes according to the commit log: fault_supports_stage2_huge_mapping(memslot, hva, (1UL << vma_shift)) > + if (vma_shift == PUD_SHIFT && > + !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) > + vma_shift = PMD_SHIFT; > + > + if (vma_shift == PMD_SHIFT && > + !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { > + force_pte = true; > + vma_shift = PAGE_SHIFT; > + } > + > + vma_pagesize = 1UL << vma_shift; > +> /* > * The stage2 has a minimum of 2 level table (For arm64 see > * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can > @@ -1889,7 +1898,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > */ > if (vma_pagesize == PMD_SIZE || > (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) > - gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; > + gfn = (fault_ipa & ~(vma_pagesize - 1)) >> PAGE_SHIFT; > mmap_read_unlock(current->mm); > > /* We need minimum second+third level pages */ > Thanks, Gavin
Hi Gavin, Many thanks for having a look at the patches! On 9/2/20 2:23 AM, Gavin Shan wrote: > Hi Alexandru, > > On 9/1/20 11:33 PM, Alexandru Elisei wrote: >> When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to >> use the same block size to map the faulting IPA in stage 2. If stage 2 >> cannot use the same size mapping because the block size doesn't fit in the >> memslot or the memslot is not properly aligned, user_mem_abort() will fall >> back to a page mapping, regardless of the block size. We can do better for >> PUD backed hugetlbfs by checking if a PMD block mapping is possible before >> deciding to use a page. >> >> vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning >> its value. >> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> >> --- >> arch/arm64/kvm/mmu.c | 19 ++++++++++++++----- >> 1 file changed, 14 insertions(+), 5 deletions(-) >> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index 25e7dc52c086..f590f7355cda 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -1871,15 +1871,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> else >> vma_shift = PAGE_SHIFT; >> - vma_pagesize = 1ULL << vma_shift; >> if (logging_active || >> - (vma->vm_flags & VM_PFNMAP) || >> - !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { >> + (vma->vm_flags & VM_PFNMAP)) { >> force_pte = true; >> - vma_pagesize = PAGE_SIZE; >> vma_shift = PAGE_SHIFT; >> } >> > > It looks incorrect because @vma_pagesize wasn't initialized when > it's passed to fault_supports_stage2_huge_mapping() for the checking. > It's assumed you missed the following changes according to the commit > log: > > fault_supports_stage2_huge_mapping(memslot, hva, (1UL << vma_shift)) I'm not sure what you mean. Maybe you've misread the diff? Because the above call to fault_supports_stage2_huge_mapping() was removed by the patch. Thanks, Alex > >> + if (vma_shift == PUD_SHIFT && >> + !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) >> + vma_shift = PMD_SHIFT; >> + >> + if (vma_shift == PMD_SHIFT && >> + !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { >> + force_pte = true; >> + vma_shift = PAGE_SHIFT; >> + } >> + >> + vma_pagesize = 1UL << vma_shift; >> +> /* >> * The stage2 has a minimum of 2 level table (For arm64 see >> * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can >> @@ -1889,7 +1898,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >> phys_addr_t fault_ipa, >> */ >> if (vma_pagesize == PMD_SIZE || >> (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) >> - gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; >> + gfn = (fault_ipa & ~(vma_pagesize - 1)) >> PAGE_SHIFT; >> mmap_read_unlock(current->mm); >> /* We need minimum second+third level pages */ >> > > Thanks, > Gavin >
Hi Alex, On 9/2/20 7:01 PM, Alexandru Elisei wrote: > On 9/2/20 2:23 AM, Gavin Shan wrote: >> On 9/1/20 11:33 PM, Alexandru Elisei wrote: >>> When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to >>> use the same block size to map the faulting IPA in stage 2. If stage 2 >>> cannot use the same size mapping because the block size doesn't fit in the >>> memslot or the memslot is not properly aligned, user_mem_abort() will fall >>> back to a page mapping, regardless of the block size. We can do better for >>> PUD backed hugetlbfs by checking if a PMD block mapping is possible before >>> deciding to use a page. >>> >>> vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning >>> its value. >>> >>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> >>> --- >>> arch/arm64/kvm/mmu.c | 19 ++++++++++++++----- >>> 1 file changed, 14 insertions(+), 5 deletions(-) >>> Reviewed-by: Gavin Shan <gshan@redhat.com> >>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >>> index 25e7dc52c086..f590f7355cda 100644 >>> --- a/arch/arm64/kvm/mmu.c >>> +++ b/arch/arm64/kvm/mmu.c >>> @@ -1871,15 +1871,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> else >>> vma_shift = PAGE_SHIFT; >>> - vma_pagesize = 1ULL << vma_shift; >>> if (logging_active || >>> - (vma->vm_flags & VM_PFNMAP) || >>> - !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { >>> + (vma->vm_flags & VM_PFNMAP)) { >>> force_pte = true; >>> - vma_pagesize = PAGE_SIZE; >>> vma_shift = PAGE_SHIFT; >>> } >>> >> >> It looks incorrect because @vma_pagesize wasn't initialized when >> it's passed to fault_supports_stage2_huge_mapping() for the checking. >> It's assumed you missed the following changes according to the commit >> log: >> >> fault_supports_stage2_huge_mapping(memslot, hva, (1UL << vma_shift)) > > I'm not sure what you mean. Maybe you've misread the diff? Because the above call > to fault_supports_stage2_huge_mapping() was removed by the patch. > Yeah, your guess is correct as I looked into the removed code :) >> >>> + if (vma_shift == PUD_SHIFT && >>> + !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) >>> + vma_shift = PMD_SHIFT; >>> + >>> + if (vma_shift == PMD_SHIFT && >>> + !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { >>> + force_pte = true; >>> + vma_shift = PAGE_SHIFT; >>> + } >>> + >>> + vma_pagesize = 1UL << vma_shift; >>> +> /* >>> * The stage2 has a minimum of 2 level table (For arm64 see >>> * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can >>> @@ -1889,7 +1898,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, >>> phys_addr_t fault_ipa, >>> */ >>> if (vma_pagesize == PMD_SIZE || >>> (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) >>> - gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; >>> + gfn = (fault_ipa & ~(vma_pagesize - 1)) >> PAGE_SHIFT; >>> mmap_read_unlock(current->mm); >>> /* We need minimum second+third level pages */ >>> >> Thanks, Gavin
Hi Alex, On Tue, 01 Sep 2020 14:33:57 +0100, Alexandru Elisei <alexandru.elisei@arm.com> wrote: > > When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to > use the same block size to map the faulting IPA in stage 2. If stage 2 > cannot use the same size mapping because the block size doesn't fit in the > memslot or the memslot is not properly aligned, user_mem_abort() will fall > back to a page mapping, regardless of the block size. We can do better for > PUD backed hugetlbfs by checking if a PMD block mapping is possible before > deciding to use a page. > > vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning > its value. > > Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> > --- > arch/arm64/kvm/mmu.c | 19 ++++++++++++++----- > 1 file changed, 14 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c > index 25e7dc52c086..f590f7355cda 100644 > --- a/arch/arm64/kvm/mmu.c > +++ b/arch/arm64/kvm/mmu.c > @@ -1871,15 +1871,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > else > vma_shift = PAGE_SHIFT; > > - vma_pagesize = 1ULL << vma_shift; > if (logging_active || > - (vma->vm_flags & VM_PFNMAP) || > - !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { > + (vma->vm_flags & VM_PFNMAP)) { > force_pte = true; > - vma_pagesize = PAGE_SIZE; > vma_shift = PAGE_SHIFT; > } > > + if (vma_shift == PUD_SHIFT && > + !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) > + vma_shift = PMD_SHIFT; > + > + if (vma_shift == PMD_SHIFT && > + !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { > + force_pte = true; > + vma_shift = PAGE_SHIFT; > + } > + > + vma_pagesize = 1UL << vma_shift; > + > /* > * The stage2 has a minimum of 2 level table (For arm64 see > * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can > @@ -1889,7 +1898,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, > */ > if (vma_pagesize == PMD_SIZE || > (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) > - gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; > + gfn = (fault_ipa & ~(vma_pagesize - 1)) >> PAGE_SHIFT; > mmap_read_unlock(current->mm); > > /* We need minimum second+third level pages */ Although this looks like a sensible change, I'm a reluctant to take it at this stage, given that we already have a bunch of patches from Will to change the way we deal with PTs. Could you look into how this could fit into the new code instead? Thanks, M.
Hi Marc, On 9/4/20 10:58 AM, Marc Zyngier wrote: > Hi Alex, > > On Tue, 01 Sep 2020 14:33:57 +0100, > Alexandru Elisei <alexandru.elisei@arm.com> wrote: >> When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to >> use the same block size to map the faulting IPA in stage 2. If stage 2 >> cannot use the same size mapping because the block size doesn't fit in the >> memslot or the memslot is not properly aligned, user_mem_abort() will fall >> back to a page mapping, regardless of the block size. We can do better for >> PUD backed hugetlbfs by checking if a PMD block mapping is possible before >> deciding to use a page. >> >> vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning >> its value. >> >> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> >> --- >> arch/arm64/kvm/mmu.c | 19 ++++++++++++++----- >> 1 file changed, 14 insertions(+), 5 deletions(-) >> >> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >> index 25e7dc52c086..f590f7355cda 100644 >> --- a/arch/arm64/kvm/mmu.c >> +++ b/arch/arm64/kvm/mmu.c >> @@ -1871,15 +1871,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> else >> vma_shift = PAGE_SHIFT; >> >> - vma_pagesize = 1ULL << vma_shift; >> if (logging_active || >> - (vma->vm_flags & VM_PFNMAP) || >> - !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { >> + (vma->vm_flags & VM_PFNMAP)) { >> force_pte = true; >> - vma_pagesize = PAGE_SIZE; >> vma_shift = PAGE_SHIFT; >> } >> >> + if (vma_shift == PUD_SHIFT && >> + !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) >> + vma_shift = PMD_SHIFT; >> + >> + if (vma_shift == PMD_SHIFT && >> + !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { >> + force_pte = true; >> + vma_shift = PAGE_SHIFT; >> + } >> + >> + vma_pagesize = 1UL << vma_shift; >> + >> /* >> * The stage2 has a minimum of 2 level table (For arm64 see >> * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can >> @@ -1889,7 +1898,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, >> */ >> if (vma_pagesize == PMD_SIZE || >> (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) >> - gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; >> + gfn = (fault_ipa & ~(vma_pagesize - 1)) >> PAGE_SHIFT; >> mmap_read_unlock(current->mm); >> >> /* We need minimum second+third level pages */ > Although this looks like a sensible change, I'm a reluctant to take it > at this stage, given that we already have a bunch of patches from Will > to change the way we deal with PTs. > > Could you look into how this could fit into the new code instead? Sure, that sounds very sensible. I'm in the process of reviewing Will's series, and after I'm done I'll rebase this on top of his patches and send it as v2. Does that sound ok to you? Or do you want me to base this patch on one of your branches? Thanks, Alex
On 2020-09-08 13:23, Alexandru Elisei wrote: > Hi Marc, > > On 9/4/20 10:58 AM, Marc Zyngier wrote: >> Hi Alex, >> >> On Tue, 01 Sep 2020 14:33:57 +0100, >> Alexandru Elisei <alexandru.elisei@arm.com> wrote: >>> When userspace uses hugetlbfs for the VM memory, user_mem_abort() >>> tries to >>> use the same block size to map the faulting IPA in stage 2. If stage >>> 2 >>> cannot use the same size mapping because the block size doesn't fit >>> in the >>> memslot or the memslot is not properly aligned, user_mem_abort() will >>> fall >>> back to a page mapping, regardless of the block size. We can do >>> better for >>> PUD backed hugetlbfs by checking if a PMD block mapping is possible >>> before >>> deciding to use a page. >>> >>> vma_pagesize is an unsigned long, use 1UL instead of 1ULL when >>> assigning >>> its value. >>> >>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> >>> --- >>> arch/arm64/kvm/mmu.c | 19 ++++++++++++++----- >>> 1 file changed, 14 insertions(+), 5 deletions(-) >>> >>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c >>> index 25e7dc52c086..f590f7355cda 100644 >>> --- a/arch/arm64/kvm/mmu.c >>> +++ b/arch/arm64/kvm/mmu.c >>> @@ -1871,15 +1871,24 @@ static int user_mem_abort(struct kvm_vcpu >>> *vcpu, phys_addr_t fault_ipa, >>> else >>> vma_shift = PAGE_SHIFT; >>> >>> - vma_pagesize = 1ULL << vma_shift; >>> if (logging_active || >>> - (vma->vm_flags & VM_PFNMAP) || >>> - !fault_supports_stage2_huge_mapping(memslot, hva, >>> vma_pagesize)) { >>> + (vma->vm_flags & VM_PFNMAP)) { >>> force_pte = true; >>> - vma_pagesize = PAGE_SIZE; >>> vma_shift = PAGE_SHIFT; >>> } >>> >>> + if (vma_shift == PUD_SHIFT && >>> + !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) >>> + vma_shift = PMD_SHIFT; >>> + >>> + if (vma_shift == PMD_SHIFT && >>> + !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { >>> + force_pte = true; >>> + vma_shift = PAGE_SHIFT; >>> + } >>> + >>> + vma_pagesize = 1UL << vma_shift; >>> + >>> /* >>> * The stage2 has a minimum of 2 level table (For arm64 see >>> * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can >>> @@ -1889,7 +1898,7 @@ static int user_mem_abort(struct kvm_vcpu >>> *vcpu, phys_addr_t fault_ipa, >>> */ >>> if (vma_pagesize == PMD_SIZE || >>> (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) >>> - gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; >>> + gfn = (fault_ipa & ~(vma_pagesize - 1)) >> PAGE_SHIFT; >>> mmap_read_unlock(current->mm); >>> >>> /* We need minimum second+third level pages */ >> Although this looks like a sensible change, I'm a reluctant to take it >> at this stage, given that we already have a bunch of patches from Will >> to change the way we deal with PTs. >> >> Could you look into how this could fit into the new code instead? > > Sure, that sounds very sensible. I'm in the process of reviewing Will's > series, > and after I'm done I'll rebase this on top of his patches and send it > as v2. Does > that sound ok to you? Or do you want me to base this patch on one of > your branches? Either way is fine (kvmarm/next has his patches). Just let me know what this is based on when you post the patches. M.
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 25e7dc52c086..f590f7355cda 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -1871,15 +1871,24 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, else vma_shift = PAGE_SHIFT; - vma_pagesize = 1ULL << vma_shift; if (logging_active || - (vma->vm_flags & VM_PFNMAP) || - !fault_supports_stage2_huge_mapping(memslot, hva, vma_pagesize)) { + (vma->vm_flags & VM_PFNMAP)) { force_pte = true; - vma_pagesize = PAGE_SIZE; vma_shift = PAGE_SHIFT; } + if (vma_shift == PUD_SHIFT && + !fault_supports_stage2_huge_mapping(memslot, hva, PUD_SIZE)) + vma_shift = PMD_SHIFT; + + if (vma_shift == PMD_SHIFT && + !fault_supports_stage2_huge_mapping(memslot, hva, PMD_SIZE)) { + force_pte = true; + vma_shift = PAGE_SHIFT; + } + + vma_pagesize = 1UL << vma_shift; + /* * The stage2 has a minimum of 2 level table (For arm64 see * kvm_arm_setup_stage2()). Hence, we are guaranteed that we can @@ -1889,7 +1898,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa, */ if (vma_pagesize == PMD_SIZE || (vma_pagesize == PUD_SIZE && kvm_stage2_has_pmd(kvm))) - gfn = (fault_ipa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; + gfn = (fault_ipa & ~(vma_pagesize - 1)) >> PAGE_SHIFT; mmap_read_unlock(current->mm); /* We need minimum second+third level pages */
When userspace uses hugetlbfs for the VM memory, user_mem_abort() tries to use the same block size to map the faulting IPA in stage 2. If stage 2 cannot use the same size mapping because the block size doesn't fit in the memslot or the memslot is not properly aligned, user_mem_abort() will fall back to a page mapping, regardless of the block size. We can do better for PUD backed hugetlbfs by checking if a PMD block mapping is possible before deciding to use a page. vma_pagesize is an unsigned long, use 1UL instead of 1ULL when assigning its value. Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com> --- arch/arm64/kvm/mmu.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)