diff mbox

[1/1,v2] KVM: MMU: Optimize guest page table walk

Message ID 20110422003444.5b3a876a.takuya.yoshikawa@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Takuya Yoshikawa April 21, 2011, 3:34 p.m. UTC
From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

This patch optimizes the guest page table walk by using get_user()
instead of copy_from_user().

With this patch applied, paging64_walk_addr_generic() has become
about 0.5us to 1.0us faster on my Phenom II machine with NPT on.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/paging_tmpl.h |   23 ++++++++++++++++++++---
 1 files changed, 20 insertions(+), 3 deletions(-)

Comments

Takuya Yoshikawa April 26, 2011, 4:50 a.m. UTC | #1
On Mon, 25 Apr 2011 11:15:20 +0200
Jan Kiszka <jan.kiszka@web.de> wrote:

> > Sorry, I did not test on x86_32.
> > 
> > Introducing a wrapper function with ifdef would be the best way?
> > 
> 
> Maybe you could also add the missing 64-bit get_user for x86-32. Given
> that we have a corresponding put_user, I wonder why the get_user was
> left out.
> 
> Jan
> 

Google said that there was a similar talk on LKML in 2004.

On that threads, Linus explained how to tackle on the 64-bit get_user
implementation.  But I could not see what happened after that.

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Kiszka April 26, 2011, 6:34 a.m. UTC | #2
On 2011-04-26 06:50, Takuya Yoshikawa wrote:
> On Mon, 25 Apr 2011 11:15:20 +0200
> Jan Kiszka <jan.kiszka@web.de> wrote:
> 
>>> Sorry, I did not test on x86_32.
>>>
>>> Introducing a wrapper function with ifdef would be the best way?
>>>
>>
>> Maybe you could also add the missing 64-bit get_user for x86-32. Given
>> that we have a corresponding put_user, I wonder why the get_user was
>> left out.
>>
>> Jan
>>
> 
> Google said that there was a similar talk on LKML in 2004.
> 
> On that threads, Linus explained how to tackle on the 64-bit get_user
> implementation.  But I could not see what happened after that.

Mmh, maybe the kernel was lacking a real use case, so no one seriously
cared.

I don't see a fundamental blocker for an x86-32 __get_user_8 version
based on two mov. I would give it a try.

Jan
Avi Kivity April 26, 2011, 7:42 a.m. UTC | #3
On 04/25/2011 11:04 AM, Jan Kiszka wrote:
> >  +
> >  +		ptep_user = (pt_element_t __user *)((void *)host_addr + offset);
> >  +		if (get_user(pte, ptep_user)) {
>                      ^^^^^^^^^^^^
> This doesn't work for x86-32: pte is 64 bit, but get_user is only
> defined up to 32 bit on that platform.
>

I actually considered this, and saw:

#ifdef CONFIG_X86_32
#define __get_user_8(__ret_gu, __val_gu, ptr)                \
         __get_user_x(X, __ret_gu, __val_gu, ptr)
#else
#define __get_user_8(__ret_gu, __val_gu, ptr)                \
         __get_user_x(8, __ret_gu, __val_gu, ptr)
#endif

#define get_user(x, ptr)                        \
({                                    \
     int __ret_gu;                            \
     unsigned long __val_gu;                        \
     __chk_user_ptr(ptr);                        \
     might_fault();                            \
     switch (sizeof(*(ptr))) {                    \

...

     case 8:                                \
         __get_user_8(__ret_gu, __val_gu, ptr);            \
         break;                            \

...

     }                                \
     (x) = (__typeof__(*(ptr)))__val_gu;                \
     __ret_gu;                            \
})

so it should work.  How does it fail?

> Avi, what's your 32-bit buildbot doing? :)

I regularly autotest on x86_64, not on i386, sorry.
Jan Kiszka April 26, 2011, 7:45 a.m. UTC | #4
On 2011-04-26 09:42, Avi Kivity wrote:
> On 04/25/2011 11:04 AM, Jan Kiszka wrote:
>> >  +
>> >  +        ptep_user = (pt_element_t __user *)((void *)host_addr +
>> offset);
>> >  +        if (get_user(pte, ptep_user)) {
>>                      ^^^^^^^^^^^^
>> This doesn't work for x86-32: pte is 64 bit, but get_user is only
>> defined up to 32 bit on that platform.
>>
> 
> I actually considered this, and saw:
> 
> #ifdef CONFIG_X86_32
> #define __get_user_8(__ret_gu, __val_gu, ptr)                \
>         __get_user_x(X, __ret_gu, __val_gu, ptr)
> #else
> #define __get_user_8(__ret_gu, __val_gu, ptr)                \
>         __get_user_x(8, __ret_gu, __val_gu, ptr)
> #endif
> 
> #define get_user(x, ptr)                        \
> ({                                    \
>     int __ret_gu;                            \
>     unsigned long __val_gu;                        \
>     __chk_user_ptr(ptr);                        \
>     might_fault();                            \
>     switch (sizeof(*(ptr))) {                    \
> 
> ...
> 
>     case 8:                                \
>         __get_user_8(__ret_gu, __val_gu, ptr);            \
>         break;                            \
> 
> ...
> 
>     }                                \
>     (x) = (__typeof__(*(ptr)))__val_gu;                \
>     __ret_gu;                            \
> })
> 
> so it should work.  How does it fail?

On x86-32, the above macro resolves to __get_user_X, an undefined symbol.

> 
>> Avi, what's your 32-bit buildbot doing? :)
> 
> I regularly autotest on x86_64, not on i386, sorry.

Good that it's included in my kvm-kmod buildbot.

Jan
Avi Kivity April 26, 2011, 8:21 a.m. UTC | #5
On 04/26/2011 10:45 AM, Jan Kiszka wrote:
> On 2011-04-26 09:42, Avi Kivity wrote:
> >  On 04/25/2011 11:04 AM, Jan Kiszka wrote:
> >>  >   +
> >>  >   +        ptep_user = (pt_element_t __user *)((void *)host_addr +
> >>  offset);
> >>  >   +        if (get_user(pte, ptep_user)) {
> >>                       ^^^^^^^^^^^^
> >>  This doesn't work for x86-32: pte is 64 bit, but get_user is only
> >>  defined up to 32 bit on that platform.
> >>
> >
> >  I actually considered this, and saw:
> >
> >  #ifdef CONFIG_X86_32
> >  #define __get_user_8(__ret_gu, __val_gu, ptr)                \
> >          __get_user_x(X, __ret_gu, __val_gu, ptr)
> >  #else
> >  #define __get_user_8(__ret_gu, __val_gu, ptr)                \
> >          __get_user_x(8, __ret_gu, __val_gu, ptr)
> >  #endif
> >
> >  #define get_user(x, ptr)                        \
> >  ({                                    \
> >      int __ret_gu;                            \
> >      unsigned long __val_gu;                        \
> >      __chk_user_ptr(ptr);                        \
> >      might_fault();                            \
> >      switch (sizeof(*(ptr))) {                    \
> >
> >  ...
> >
> >      case 8:                                \
> >          __get_user_8(__ret_gu, __val_gu, ptr);            \
> >          break;                            \
> >
> >  ...
> >
> >      }                                \
> >      (x) = (__typeof__(*(ptr)))__val_gu;                \
> >      __ret_gu;                            \
> >  })
> >
> >  so it should work.  How does it fail?
>
> On x86-32, the above macro resolves to __get_user_X, an undefined symbol.
>

Tricky stuff.
Takuya Yoshikawa April 26, 2011, 2:40 p.m. UTC | #6
On Tue, 26 Apr 2011 08:34:57 +0200
Jan Kiszka <jan.kiszka@web.de> wrote:

> > Google said that there was a similar talk on LKML in 2004.
> > 
> > On that threads, Linus explained how to tackle on the 64-bit get_user
> > implementation.  But I could not see what happened after that.
> 
> Mmh, maybe the kernel was lacking a real use case, so no one seriously
> cared.
> 
> I don't see a fundamental blocker for an x86-32 __get_user_8 version
> based on two mov. I would give it a try.
> 
> Jan
> 

Thank you!

Avi, do we revert the patch now, or ...?

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity April 26, 2011, 2:54 p.m. UTC | #7
On 04/26/2011 05:40 PM, Takuya Yoshikawa wrote:
> On Tue, 26 Apr 2011 08:34:57 +0200
> Jan Kiszka<jan.kiszka@web.de>  wrote:
>
> >  >  Google said that there was a similar talk on LKML in 2004.
> >  >
> >  >  On that threads, Linus explained how to tackle on the 64-bit get_user
> >  >  implementation.  But I could not see what happened after that.
> >
> >  Mmh, maybe the kernel was lacking a real use case, so no one seriously
> >  cared.
> >
> >  I don't see a fundamental blocker for an x86-32 __get_user_8 version
> >  based on two mov. I would give it a try.
> >
> >  Jan
> >
>
> Thank you!
>
> Avi, do we revert the patch now, or ...?

Please post a simple patch that uses two get_user()s for that case 
(64-bit pte on 32-bit host).  Then work with the x86 tree to see if 
they'll accept 64-bit get_user(), and once they do, we can go back to a 
simple get_user() again.

btw, I think we can use __get_user() here since the address must have 
been already validated.
Takuya Yoshikawa April 26, 2011, 3:13 p.m. UTC | #8
> Please post a simple patch that uses two get_user()s for that case 
> (64-bit pte on 32-bit host).  Then work with the x86 tree to see if 
> they'll accept 64-bit get_user(), and once they do, we can go back to a 
> simple get_user() again.
> 
> btw, I think we can use __get_user() here since the address must have 
> been already validated.

Yes, I thought that at first.

But don't we need to care KVM's address translation bugs?

Takuya
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Avi Kivity April 26, 2011, 5:15 p.m. UTC | #9
On 04/26/2011 06:13 PM, Takuya Yoshikawa wrote:
> >  Please post a simple patch that uses two get_user()s for that case
> >  (64-bit pte on 32-bit host).  Then work with the x86 tree to see if
> >  they'll accept 64-bit get_user(), and once they do, we can go back to a
> >  simple get_user() again.
> >
> >  btw, I think we can use __get_user() here since the address must have
> >  been already validated.
>
> Yes, I thought that at first.
>
> But don't we need to care KVM's address translation bugs?

It's a pity to do a runtime check when we can do a setup time check 
instead.  So let's review the setup code and then use __get_user().
diff mbox

Patch

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 74f8567..825d953 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -117,6 +117,7 @@  static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 				    gva_t addr, u32 access)
 {
 	pt_element_t pte;
+	pt_element_t __user *ptep_user;
 	gfn_t table_gfn;
 	unsigned index, pt_access, uninitialized_var(pte_access);
 	gpa_t pte_gpa;
@@ -152,6 +153,9 @@  walk:
 	pt_access = ACC_ALL;
 
 	for (;;) {
+		gfn_t real_gfn;
+		unsigned long host_addr;
+
 		index = PT_INDEX(addr, walker->level);
 
 		table_gfn = gpte_to_gfn(pte);
@@ -160,9 +164,22 @@  walk:
 		walker->table_gfn[walker->level - 1] = table_gfn;
 		walker->pte_gpa[walker->level - 1] = pte_gpa;
 
-		if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte,
-					    offset, sizeof(pte),
-					    PFERR_USER_MASK|PFERR_WRITE_MASK)) {
+		real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
+					      PFERR_USER_MASK|PFERR_WRITE_MASK);
+		if (real_gfn == UNMAPPED_GVA) {
+			present = false;
+			break;
+		}
+		real_gfn = gpa_to_gfn(real_gfn);
+
+		host_addr = gfn_to_hva(vcpu->kvm, real_gfn);
+		if (kvm_is_error_hva(host_addr)) {
+			present = false;
+			break;
+		}
+
+		ptep_user = (pt_element_t __user *)((void *)host_addr + offset);
+		if (get_user(pte, ptep_user)) {
 			present = false;
 			break;
 		}