Message ID | 20110419033814.3cc7ab5e.takuya.yoshikawa@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, Apr 19, 2011 at 03:38:14AM +0900, Takuya Yoshikawa wrote: > From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> > > We optimize multi level guest page table walk as follows: > > 1. We cache the memslot which, probably, includes the next guest page > tables to avoid searching for it many times. > 2. We use get_user() instead of copy_from_user(). > > Note that this is kind of a restricted way of Xiao's more generic > work: "KVM: optimize memslots searching and cache GPN to GFN." > > With this patch applied, paging64_walk_addr_generic() has improved > as the following tracing results show. > > Before: > 3.169 us | paging64_walk_addr_generic(); > 1.880 us | paging64_walk_addr_generic(); > 1.243 us | paging64_walk_addr_generic(); > 1.517 us | paging64_walk_addr_generic(); > 3.009 us | paging64_walk_addr_generic(); > 1.814 us | paging64_walk_addr_generic(); > 1.340 us | paging64_walk_addr_generic(); > 1.659 us | paging64_walk_addr_generic(); > 1.748 us | paging64_walk_addr_generic(); > 1.488 us | paging64_walk_addr_generic(); > > After: > 1.714 us | paging64_walk_addr_generic(); > 0.806 us | paging64_walk_addr_generic(); > 0.664 us | paging64_walk_addr_generic(); > 0.619 us | paging64_walk_addr_generic(); > 0.645 us | paging64_walk_addr_generic(); > 0.605 us | paging64_walk_addr_generic(); > 1.388 us | paging64_walk_addr_generic(); > 0.753 us | paging64_walk_addr_generic(); > 0.594 us | paging64_walk_addr_generic(); > 0.833 us | paging64_walk_addr_generic(); Nice optimization! What scenarios have you used to test it? Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Joerg Roedel <joro@8bytes.org> wrote:
> Nice optimization! What scenarios have you used to test it?
I used my desktop Phenom II box, running the latest qemu-kvm.
So probably, NPT was ON by default.
The guest was running a .ogg movie during that test.
I am not an MMU expert. So I would be glad if I can know what
scenarios should be tested for this patch!
Can I test nested SVM easily, e.g.?
Thanks,
Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/19/2011 02:38 AM, Takuya Yoshikawa wrote: > From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> > > We optimize multi level guest page table walk as follows: > > 1. We cache the memslot which, probably, includes the next guest page > tables to avoid searching for it many times. Yeah, the hit is very high, after optimizing the algorithm of memslots (http://lwn.net/Articles/429308/), maybe the advantage is not so significant, could you apply this patchset and test again please? > 2. We use get_user() instead of copy_from_user(). > > Note that this is kind of a restricted way of Xiao's more generic > work: "KVM: optimize memslots searching and cache GPN to GFN." > > With this patch applied, paging64_walk_addr_generic() has improved > as the following tracing results show. > > Before: > 3.169 us | paging64_walk_addr_generic(); > 1.880 us | paging64_walk_addr_generic(); > 1.243 us | paging64_walk_addr_generic(); > 1.517 us | paging64_walk_addr_generic(); > 3.009 us | paging64_walk_addr_generic(); > 1.814 us | paging64_walk_addr_generic(); > 1.340 us | paging64_walk_addr_generic(); > 1.659 us | paging64_walk_addr_generic(); > 1.748 us | paging64_walk_addr_generic(); > 1.488 us | paging64_walk_addr_generic(); > > After: > 1.714 us | paging64_walk_addr_generic(); > 0.806 us | paging64_walk_addr_generic(); > 0.664 us | paging64_walk_addr_generic(); > 0.619 us | paging64_walk_addr_generic(); > 0.645 us | paging64_walk_addr_generic(); > 0.605 us | paging64_walk_addr_generic(); > 1.388 us | paging64_walk_addr_generic(); > 0.753 us | paging64_walk_addr_generic(); > 0.594 us | paging64_walk_addr_generic(); > 0.833 us | paging64_walk_addr_generic(); > > Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> > --- > arch/x86/kvm/paging_tmpl.h | 37 ++++++++++++++++++++++++++++++++----- > 1 files changed, 32 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h > index 109939a..614aa3f 100644 > --- a/arch/x86/kvm/paging_tmpl.h > +++ b/arch/x86/kvm/paging_tmpl.h > @@ -109,12 +109,37 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) > return access; > } > > +/* > + * Read the guest PTE refered to by table_gfn and offset and put it into ptep. > + * > + * *slot_hint, if not NULL, should point to a memslot which probably includes > + * the guest PTE. The actual memslot will be put back into this so that > + * callers can cache it. > + */ > static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, > - gfn_t table_gfn, int offset, pt_element_t *ptep) > + gfn_t table_gfn, int offset, pt_element_t *ptep, > + struct kvm_memory_slot **slot_hint) > { > - return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep, > - offset, sizeof(*ptep), > - PFERR_USER_MASK | PFERR_WRITE_MASK); > + unsigned long addr; > + pt_element_t __user *ptep_user; > + gfn_t real_gfn; > + > + real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), > + PFERR_USER_MASK | PFERR_WRITE_MASK); > + if (real_gfn == UNMAPPED_GVA) > + return -EFAULT; > + > + real_gfn = gpa_to_gfn(real_gfn); > + > + if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn)) > + *slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn); > + You forgot to check the result. (if *slot_hint == NULL)? ...?;-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote: > > We optimize multi level guest page table walk as follows: > > > > 1. We cache the memslot which, probably, includes the next guest page > > tables to avoid searching for it many times. > > Yeah, the hit is very high, after optimizing the algorithm of memslots > (http://lwn.net/Articles/429308/), maybe the advantage is not so significant, > could you apply this patchset and test again please? Any sorting, including tree based, strategies have tradoffs. Compared to that, what I wanted to do here was to improve the table walk locally without sacrificing other things. Of course, my strategy depends on the assumption that the page tables will be in the same slot in very high probability. So if certain algorithm seems to be addapted, yes, I will test based on that. IIRC, any practically good algorithm has not been found yet, right? > > > 2. We use get_user() instead of copy_from_user(). > > > > Note that this is kind of a restricted way of Xiao's more generic > > work: "KVM: optimize memslots searching and cache GPN to GFN." > > > > With this patch applied, paging64_walk_addr_generic() has improved > > as the following tracing results show. > > > > + > > + if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn)) > > + *slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn); > > + > > You forgot to check the result. (if *slot_hint == NULL)? ...?;-) Thank you! I will check later. Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 19, 2011 at 10:24:10AM +0900, Takuya Yoshikawa wrote:
> Can I test nested SVM easily, e.g.?
Yes, nested SVM would be good to test too with those changes. We should
make sure to not break something there.
Regards,
Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/18/2011 09:38 PM, Takuya Yoshikawa wrote: > From: Takuya Yoshikawa<yoshikawa.takuya@oss.ntt.co.jp> > > We optimize multi level guest page table walk as follows: > > 1. We cache the memslot which, probably, includes the next guest page > tables to avoid searching for it many times. > 2. We use get_user() instead of copy_from_user(). > > Note that this is kind of a restricted way of Xiao's more generic > work: "KVM: optimize memslots searching and cache GPN to GFN." Good optimization. copy_from_user() really isn't optimized for short buffers, I expect much of the improvement comes from that. > +/* > + * Read the guest PTE refered to by table_gfn and offset and put it into ptep. > + * > + * *slot_hint, if not NULL, should point to a memslot which probably includes > + * the guest PTE. The actual memslot will be put back into this so that > + * callers can cache it. > + */ Please drop the slot_hint optimization. First, it belongs in a separate patch. Second, I prefer to see a generic slot sort instead of an ad-hoc cache. > static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, > - gfn_t table_gfn, int offset, pt_element_t *ptep) > + gfn_t table_gfn, int offset, pt_element_t *ptep, > + struct kvm_memory_slot **slot_hint) > { > - return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep, > - offset, sizeof(*ptep), > - PFERR_USER_MASK | PFERR_WRITE_MASK); > + unsigned long addr; > + pt_element_t __user *ptep_user; > + gfn_t real_gfn; > + > + real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), > + PFERR_USER_MASK | PFERR_WRITE_MASK); > + if (real_gfn == UNMAPPED_GVA) > + return -EFAULT; > + > + real_gfn = gpa_to_gfn(real_gfn); > + > + if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn)) > + *slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn); > + > + addr = gfn_to_hva_memslot(*slot_hint, real_gfn); > + if (kvm_is_error_hva(addr)) > + return -EFAULT; > + > + ptep_user = (pt_element_t __user *)((void *)addr + offset); > + return get_user(*ptep, ptep_user); > } > > /* > @@ -130,6 +155,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, > gpa_t pte_gpa; > bool eperm, present, rsvd_fault; > int offset, write_fault, user_fault, fetch_fault; > + struct kvm_memory_slot *slot_cache = NULL; > > write_fault = access& PFERR_WRITE_MASK; > user_fault = access& PFERR_USER_MASK; > @@ -168,7 +194,8 @@ walk: > walker->table_gfn[walker->level - 1] = table_gfn; > walker->pte_gpa[walker->level - 1] = pte_gpa; > > - if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) { > + if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, > + offset,&pte,&slot_cache)) { > present = false; > break; > }
On 04/19/2011 06:47 AM, Takuya Yoshikawa wrote: > So if certain algorithm seems to be addapted, yes, I will test based > on that. IIRC, any practically good algorithm has not been found yet, > right? I think a simple sort based on size will provide the same optimization (just the cache, not get_user()) without any downsides. Most memory in a guest is usually in just one or two slots, that's the reason for the high hit rate.
Avi Kivity <avi@redhat.com> writes: > > Good optimization. copy_from_user() really isn't optimized for short > buffers, I expect much of the improvement comes from that. Actually it is equivalent to get_user for the lenghts supported by get_user, assuming you pass in a constant length. You probably do not. -Andi
On Thu, 28 Apr 2011 19:46:00 -0700 Andi Kleen <andi@firstfloor.org> wrote: > Avi Kivity <avi@redhat.com> writes: > > > > Good optimization. copy_from_user() really isn't optimized for short > > buffers, I expect much of the improvement comes from that. > > Actually it is equivalent to get_user for the lenghts supported by > get_user, assuming you pass in a constant length. You probably do not. > > -Andi Weird, I actually measured some changes even after dropping other parts than get_user() usage. Only I can guess for that reason is the reduction of some function calls by inlining some functions. Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 29 Apr 2011 14:38:08 +0900 Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote: > On Thu, 28 Apr 2011 19:46:00 -0700 > Andi Kleen <andi@firstfloor.org> wrote: > > > Avi Kivity <avi@redhat.com> writes: > > > > > > Good optimization. copy_from_user() really isn't optimized for short > > > buffers, I expect much of the improvement comes from that. > > > > Actually it is equivalent to get_user for the lenghts supported by > > get_user, assuming you pass in a constant length. You probably do not. > > > > -Andi > > > Weird, I actually measured some changes even after dropping other parts > than get_user() usage. > > Only I can guess for that reason is the reduction of some function calls > by inlining some functions. A bit to clarify. Original path: kvm_read_guest_page_mmu() kvm_read_guest_page() copy_from_user() Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Only I can guess for that reason is the reduction of some function calls > by inlining some functions. Yes once at a time cfu was inline too and just checked for the right sizes and the used g*u, but it got lost in the "icache over everything else" mania which is unfortunately en vogue for quite some time in kernel land (aka measuring patches only based on their impact on the .text size, not the actual performance) But you might getter better gains by fixing this general c*u() regression. -Andi
Andi Kleen <andi@firstfloor.org> wrote: > > Only I can guess for that reason is the reduction of some function calls > > by inlining some functions. > > Yes once at a time cfu was inline too and just checked for the right > sizes and the used g*u, but it got lost in the "icache over everything else" mania which is unfortunately en vogue for quite some time in kernel land (aka > measuring patches only based on their impact on the .text > size, not the actual performance) > > But you might getter better gains by fixing this general > c*u() regression. > What I mentioned was about not only cfu but 3 functions. Originally, we were doing the following function calls: walk_addr_generic() ---1 kvm_read_guest_page_mmu() ---2 kvm_read_guest_page() ---3 copy_from_user() ---4 And now, with my patch already applied, we are not using generic kvm_read_guest_page_mmu() and some address calculations are done in walk_addr_generic(): walk_addr_generic() ---1' get_user() ---2' The length is passed in from walk_addr_generic(). Do you think the following case would not differ so much from (1' 2') ? walk_addr_generic() ---1'' copy_from_user() ---2'' This can satisfy your "assuming you pass in a constant length" and almost same as get_user() ? Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Do you think the following case would not differ so much > from (1' 2') ? > > walk_addr_generic() ---1'' > copy_from_user() ---2'' Yes it should be the same and is cleaner. If you do a make .../foo.i and look at the code coming out of the preprocessor you'll see it expands to a if (!__builtin_constant_p(size)) return copy_user_generic(dst, (__force void *)src, size); switch (size) { case 1:__get_user_asm(*(u8 *)dst, (u8 __user *)src, ret, "b", "b", "=q", 1); return ret; case 2: .. case 4: .. case 8: .. case 10: .. case 16: .. } Ok it looks like the 32bit kernel only handles 1/2/4. Maybe that was the problem if you ran on 32bit. -Andi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/29/2011 07:05 PM, Andi Kleen wrote: > > Do you think the following case would not differ so much > > from (1' 2') ? > > > > walk_addr_generic() ---1'' > > copy_from_user() ---2'' > > Yes it should be the same and is cleaner. > > If you do a make .../foo.i and look at the code coming out of the > preprocessor you'll see it expands to a > > if (!__builtin_constant_p(size)) > return copy_user_generic(dst, (__force void *)src, size); > switch (size) { > case 1:__get_user_asm(*(u8 *)dst, (u8 __user *)src, > ret, "b", "b", "=q", 1); > return ret; > case 2: .. > case 4: .. > case 8: .. > case 10: .. > case 16: .. > } > > Ok it looks like the 32bit kernel only handles 1/2/4. Maybe that > was the problem if you ran on 32bit. I'm happy with a slower copy_from_user() for that particular case.
>> Ok it looks like the 32bit kernel only handles 1/2/4. Maybe that >> was the problem if you ran on 32bit. > > I'm happy with a slower copy_from_user() for that particular case. It wouldn't be hard to fix. -Andi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 109939a..614aa3f 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -109,12 +109,37 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte) return access; } +/* + * Read the guest PTE refered to by table_gfn and offset and put it into ptep. + * + * *slot_hint, if not NULL, should point to a memslot which probably includes + * the guest PTE. The actual memslot will be put back into this so that + * callers can cache it. + */ static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, - gfn_t table_gfn, int offset, pt_element_t *ptep) + gfn_t table_gfn, int offset, pt_element_t *ptep, + struct kvm_memory_slot **slot_hint) { - return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep, - offset, sizeof(*ptep), - PFERR_USER_MASK | PFERR_WRITE_MASK); + unsigned long addr; + pt_element_t __user *ptep_user; + gfn_t real_gfn; + + real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn), + PFERR_USER_MASK | PFERR_WRITE_MASK); + if (real_gfn == UNMAPPED_GVA) + return -EFAULT; + + real_gfn = gpa_to_gfn(real_gfn); + + if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn)) + *slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn); + + addr = gfn_to_hva_memslot(*slot_hint, real_gfn); + if (kvm_is_error_hva(addr)) + return -EFAULT; + + ptep_user = (pt_element_t __user *)((void *)addr + offset); + return get_user(*ptep, ptep_user); } /* @@ -130,6 +155,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, gpa_t pte_gpa; bool eperm, present, rsvd_fault; int offset, write_fault, user_fault, fetch_fault; + struct kvm_memory_slot *slot_cache = NULL; write_fault = access & PFERR_WRITE_MASK; user_fault = access & PFERR_USER_MASK; @@ -168,7 +194,8 @@ walk: walker->table_gfn[walker->level - 1] = table_gfn; walker->pte_gpa[walker->level - 1] = pte_gpa; - if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset, &pte)) { + if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, + offset, &pte, &slot_cache)) { present = false; break; }