Message ID | 4E37DA73.7010908@cn.fujitsu.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/02/2011 02:07 PM, Xiao Guangrong wrote: > The idea is from Avi: > | tag instructions that are typically used to modify the page tables, and drop > | shadow if any other instruction is used > | The list would include, I'd guess, and, or, bts, btc, mov, xchg, cmpxchg, and > | cmpxchg8b > > This patch is used to tag the instructions and in the later path, shadow page > is dropped if it is written by other instructions We already have a mechanism for this, the decode tables. Why not add a new flag, PageTable, and set it on all relevant instructions? Note we don't need to actually emulate, just decode, since page_fault can tell us whether a write failed due to page tables or mmio.
On 08/02/2011 08:20 PM, Avi Kivity wrote: > On 08/02/2011 02:07 PM, Xiao Guangrong wrote: >> The idea is from Avi: >> | tag instructions that are typically used to modify the page tables, and drop >> | shadow if any other instruction is used >> | The list would include, I'd guess, and, or, bts, btc, mov, xchg, cmpxchg, and >> | cmpxchg8b >> >> This patch is used to tag the instructions and in the later path, shadow page >> is dropped if it is written by other instructions > > We already have a mechanism for this, the decode tables. Why not add a new flag, PageTable, and set it on all relevant instructions? > OK, will do it in the next version. > Note we don't need to actually emulate, just decode, since page_fault can tell us whether a write failed due to page tables or mmio. > This is a interesting feature. If it happens, i will just drop the shadow pages and retry these instructions directly. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/03/2011 09:02 AM, Xiao Guangrong wrote: > > Note we don't need to actually emulate, just decode, since page_fault can tell us whether a write failed due to page tables or mmio. > > > > This is a interesting feature. If it happens, i will just drop the shadow pages > and retry these instructions directly. Note it's a little dangerous. If the guest uses a non-page-table modifying instruction on the PDE that points to the instruction, then we will unmap the instruction and go to an infinite loop. Maybe it's better to emulate if we can't find a fix for that. One way would be to emulate every 20 instructions; this breaks us out of the loop but reduces costly emulations to 5%.
Hi Avi, On 08/03/2011 04:09 PM, Avi Kivity wrote: > On 08/03/2011 09:02 AM, Xiao Guangrong wrote: >> > Note we don't need to actually emulate, just decode, since page_fault can tell us whether a write failed due to page tables or mmio. >> > >> >> This is a interesting feature. If it happens, i will just drop the shadow pages >> and retry these instructions directly. > > Note it's a little dangerous. If the guest uses a non-page-table modifying instruction on the PDE that points to the instruction, then we will unmap the instruction and go to an infinite loop. > Yes, it is. > Maybe it's better to emulate if we can't find a fix for that. > > One way would be to emulate every 20 instructions; this breaks us out of the loop but reduces costly emulations to 5%. > After much thought about this, may be this optimization is not good since: - it is little complex - this optimization is only applied to the instruction emulation caused by #PF - it does not improve too much: if we emulate the instruction, we need to do: - decode instruction - emulate it - zap shadow pages And do this, it can return to the guest, the guest can run the next instruction if we retry the instruction, we need to do: - decode instruction - zap shadow pages then return to the guest and retry the instruction, however, we will get page fault again(since the mapping is still read-only), so we will get another VM-exit and need to do: # trigger page fault - handle the page fault and change the mapping to writable - retry the instruction until now, the guest can run the next instruction So, i do not think the new way is better, your opinion? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/03/2011 12:24 PM, Xiao Guangrong wrote: > > Maybe it's better to emulate if we can't find a fix for that. > > > > One way would be to emulate every 20 instructions; this breaks us out of the loop but reduces costly emulations to 5%. > > > > After much thought about this, may be this optimization is not good since: > - it is little complex > - this optimization is only applied to the instruction emulation caused by #PF > - it does not improve too much: > if we emulate the instruction, we need to do: > - decode instruction > - emulate it > - zap shadow pages > And do this, it can return to the guest, the guest can run the next instruction > > if we retry the instruction, we need to do: > - decode instruction > - zap shadow pages > then return to the guest and retry the instruction, however, we will get page fault > again(since the mapping is still read-only), so we will get another VM-exit and need > to do: > # trigger page fault > - handle the page fault and change the mapping to writable > - retry the instruction > until now, the guest can run the next instruction > > So, i do not think the new way is better, your opinion? We can change to writeable and zap in the same exit, no? Basically call page_fault() again after zapping.
On 08/03/2011 12:41 PM, Xiao Guangrong wrote: > OK. :-) > > Um, how about cache the last eip when we do this optimization, if we meet the same eip, we > can break out the potential infinite loop? Yes, that can work.
On 08/03/2011 05:25 PM, Avi Kivity wrote: > On 08/03/2011 12:24 PM, Xiao Guangrong wrote: >> > Maybe it's better to emulate if we can't find a fix for that. >> > >> > One way would be to emulate every 20 instructions; this breaks us out of the loop but reduces costly emulations to 5%. >> > >> >> After much thought about this, may be this optimization is not good since: >> - it is little complex >> - this optimization is only applied to the instruction emulation caused by #PF >> - it does not improve too much: >> if we emulate the instruction, we need to do: >> - decode instruction >> - emulate it >> - zap shadow pages >> And do this, it can return to the guest, the guest can run the next instruction >> >> if we retry the instruction, we need to do: >> - decode instruction >> - zap shadow pages >> then return to the guest and retry the instruction, however, we will get page fault >> again(since the mapping is still read-only), so we will get another VM-exit and need >> to do: >> # trigger page fault >> - handle the page fault and change the mapping to writable >> - retry the instruction >> until now, the guest can run the next instruction >> >> So, i do not think the new way is better, your opinion? > > We can change to writeable and zap in the same exit, no? Basically call page_fault() again after zapping. > OK. :-) Um, how about cache the last eip when we do this optimization, if we meet the same eip, we can break out the potential infinite loop? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 6040d11..049a6f5 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -244,6 +244,7 @@ struct x86_emulate_ctxt { bool guest_mode; /* guest running a nested guest */ bool perm_ok; /* do not check permissions if true */ bool only_vendor_specific_insn; + bool page_table_written_insn; bool have_exception; struct x86_exception exception; diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 0453c07..3c027ac 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -2516,6 +2516,7 @@ static int em_add(struct x86_emulate_ctxt *ctxt) static int em_or(struct x86_emulate_ctxt *ctxt) { emulate_2op_SrcV("or", ctxt->src, ctxt->dst, ctxt->eflags); + tag_page_table_written_insn(ctxt); return X86EMUL_CONTINUE; } @@ -2534,6 +2535,7 @@ static int em_sbb(struct x86_emulate_ctxt *ctxt) static int em_and(struct x86_emulate_ctxt *ctxt) { emulate_2op_SrcV("and", ctxt->src, ctxt->dst, ctxt->eflags); + tag_page_table_written_insn(ctxt); return X86EMUL_CONTINUE; } @@ -2572,6 +2574,7 @@ static int em_xchg(struct x86_emulate_ctxt *ctxt) /* Write back the memory destination with implicit LOCK prefix. */ ctxt->dst.val = ctxt->src.orig_val; ctxt->lock_prefix = 1; + tag_page_table_written_insn(ctxt); return X86EMUL_CONTINUE; } @@ -2610,6 +2613,7 @@ static int em_rdtsc(struct x86_emulate_ctxt *ctxt) static int em_mov(struct x86_emulate_ctxt *ctxt) { ctxt->dst.val = ctxt->src.val; + tag_page_table_written_insn(ctxt); return X86EMUL_CONTINUE; } @@ -4135,6 +4139,7 @@ twobyte_insn: break; case 0xab: bts: /* bts */ + tag_page_table_written_insn(ctxt); emulate_2op_SrcV_nobyte("bts", ctxt->src, ctxt->dst, ctxt->eflags); break; case 0xac: /* shrd imm8, r, r/m */ @@ -4148,6 +4153,7 @@ twobyte_insn: * Save real source value, then compare EAX against * destination. */ + tag_page_table_written_insn(ctxt); ctxt->src.orig_val = ctxt->src.val; ctxt->src.val = ctxt->regs[VCPU_REGS_RAX]; emulate_2op_SrcV("cmp", ctxt->src, ctxt->dst, ctxt->eflags); @@ -4192,6 +4198,7 @@ twobyte_insn: break; case 0xbb: btc: /* btc */ + tag_page_table_written_insn(ctxt); emulate_2op_SrcV_nobyte("btc", ctxt->src, ctxt->dst, ctxt->eflags); break; case 0xbc: { /* bsf */ @@ -4235,6 +4242,7 @@ twobyte_insn: (u64) ctxt->src.val; break; case 0xc7: /* Grp9 (cmpxchg8b) */ + tag_page_table_written_insn(ctxt); rc = em_grp9(ctxt); break; default: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ea8f9f0..cf6fb29 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4831,6 +4831,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, ctxt->interruptibility = 0; ctxt->have_exception = false; ctxt->perm_ok = false; + ctxt->page_table_written_insn = false; ctxt->only_vendor_specific_insn = emulation_type & EMULTYPE_TRAP_UD; diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index d36fe23..b6e868f 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -111,6 +111,11 @@ static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa) return false; } +static inline void tag_page_table_written_insn(struct x86_emulate_ctxt *ctxt) +{ + ctxt->page_table_written_insn = true; +} + void kvm_before_handle_nmi(struct kvm_vcpu *vcpu); void kvm_after_handle_nmi(struct kvm_vcpu *vcpu); int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip);
The idea is from Avi: | tag instructions that are typically used to modify the page tables, and drop | shadow if any other instruction is used | The list would include, I'd guess, and, or, bts, btc, mov, xchg, cmpxchg, and | cmpxchg8b This patch is used to tag the instructions and in the later path, shadow page is dropped if it is written by other instructions Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> --- arch/x86/include/asm/kvm_emulate.h | 1 + arch/x86/kvm/emulate.c | 8 ++++++++ arch/x86/kvm/x86.c | 1 + arch/x86/kvm/x86.h | 5 +++++ 4 files changed, 15 insertions(+), 0 deletions(-)