Message ID | 1493802603-4978-3-git-send-email-rcojocaru@bitdefender.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 03.05.17 at 11:10, <rcojocaru@bitdefender.com> wrote: > The introspection agent can reply to a vm_event faster than > vmx_vmexit_handler() can complete in some cases, where it is then > not safe for vm_event_set_registers() to modify v->arch.user_regs. > In the test scenario, we were stepping over an INT3 breakpoint by > setting RIP += 1. The quick reply tended to complete before the VCPU > triggering the introspection event had properly paused and been > descheduled. If the reply occurs before __context_switch() happens, > __context_switch() clobbers the reply by overwriting > v->arch.user_regs from the stack. If the reply occurs after > __context_switch(), we don't pass through __context_switch() when > transitioning to idle. This last sentence still looks to be wrong (and even self-contradictory). The reply can't occur after __context_switch() if we don't make it there in the first place. How about "If we don't pass through __context_switch() (due to switching to the idle vCPU), reply data wouldn't be picked up when switching back straight to the original vCPU"? > This patch ensures that vm_event_resume() code only sets per-VCPU > data to be used for the actual setting of registers later in > hvm_do_resume() (similar to the model used to control setting of CRs > and MSRs). > > The patch additionally removes the sync_vcpu_execstate(v) call from > vm_event_resume(), which is no longer necessary, which removes the > associated broadcast TLB flush (read: performance improvement). > > Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Code changes themselves Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
On 05/03/17 13:01, Jan Beulich wrote: >>>> On 03.05.17 at 11:10, <rcojocaru@bitdefender.com> wrote: >> The introspection agent can reply to a vm_event faster than >> vmx_vmexit_handler() can complete in some cases, where it is then >> not safe for vm_event_set_registers() to modify v->arch.user_regs. >> In the test scenario, we were stepping over an INT3 breakpoint by >> setting RIP += 1. The quick reply tended to complete before the VCPU >> triggering the introspection event had properly paused and been >> descheduled. If the reply occurs before __context_switch() happens, >> __context_switch() clobbers the reply by overwriting >> v->arch.user_regs from the stack. If the reply occurs after >> __context_switch(), we don't pass through __context_switch() when >> transitioning to idle. > > This last sentence still looks to be wrong (and even self-contradictory). > The reply can't occur after __context_switch() if we don't make it there > in the first place. How about "If we don't pass through > __context_switch() (due to switching to the idle vCPU), reply data > wouldn't be picked up when switching back straight to the original > vCPU"? Quite right, it's very convoluted. I'll update the comment. Thanks, Razvan
On Wed, May 3, 2017 at 5:10 AM, Razvan Cojocaru <rcojocaru@bitdefender.com> wrote: > > The introspection agent can reply to a vm_event faster than > vmx_vmexit_handler() can complete in some cases, where it is then > not safe for vm_event_set_registers() to modify v->arch.user_regs. > In the test scenario, we were stepping over an INT3 breakpoint by > setting RIP += 1. The quick reply tended to complete before the VCPU > triggering the introspection event had properly paused and been > descheduled. If the reply occurs before __context_switch() happens, > __context_switch() clobbers the reply by overwriting > v->arch.user_regs from the stack. If the reply occurs after > __context_switch(), we don't pass through __context_switch() when > transitioning to idle. > > This patch ensures that vm_event_resume() code only sets per-VCPU > data to be used for the actual setting of registers later in > hvm_do_resume() (similar to the model used to control setting of CRs > and MSRs). > > The patch additionally removes the sync_vcpu_execstate(v) call from > vm_event_resume(), which is no longer necessary, which removes the > associated broadcast TLB flush (read: performance improvement). > > Signed-off-by: Razvan Cojocaru <rcojocaru@bitdefender.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Acked-by: Tamas K Lengyel <tamas@tklengyel.com>
diff --git a/xen/arch/x86/hvm/vm_event.c b/xen/arch/x86/hvm/vm_event.c index b35ee12..bfb95a2 100644 --- a/xen/arch/x86/hvm/vm_event.c +++ b/xen/arch/x86/hvm/vm_event.c @@ -23,6 +23,39 @@ #include <asm/hvm/support.h> #include <asm/vm_event.h> +static void hvm_vm_event_set_registers(const struct vcpu *v) +{ + ASSERT(v == current); + + if ( unlikely(v->arch.vm_event->set_gprs) ) + { + struct cpu_user_regs *regs = guest_cpu_user_regs(); + + regs->rax = v->arch.vm_event->gprs.rax; + regs->rbx = v->arch.vm_event->gprs.rbx; + regs->rcx = v->arch.vm_event->gprs.rcx; + regs->rdx = v->arch.vm_event->gprs.rdx; + regs->rsp = v->arch.vm_event->gprs.rsp; + regs->rbp = v->arch.vm_event->gprs.rbp; + regs->rsi = v->arch.vm_event->gprs.rsi; + regs->rdi = v->arch.vm_event->gprs.rdi; + + regs->r8 = v->arch.vm_event->gprs.r8; + regs->r9 = v->arch.vm_event->gprs.r9; + regs->r10 = v->arch.vm_event->gprs.r10; + regs->r11 = v->arch.vm_event->gprs.r11; + regs->r12 = v->arch.vm_event->gprs.r12; + regs->r13 = v->arch.vm_event->gprs.r13; + regs->r14 = v->arch.vm_event->gprs.r14; + regs->r15 = v->arch.vm_event->gprs.r15; + + regs->rflags = v->arch.vm_event->gprs.rflags; + regs->rip = v->arch.vm_event->gprs.rip; + + v->arch.vm_event->set_gprs = false; + } +} + void hvm_vm_event_do_resume(struct vcpu *v) { struct monitor_write_data *w; @@ -30,6 +63,8 @@ void hvm_vm_event_do_resume(struct vcpu *v) if ( likely(!v->arch.vm_event) ) return; + hvm_vm_event_set_registers(v); + w = &v->arch.vm_event->write_data; if ( unlikely(v->arch.vm_event->emulate_flags) ) diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c index 502ccc2..a6ea42c 100644 --- a/xen/arch/x86/vm_event.c +++ b/xen/arch/x86/vm_event.c @@ -113,26 +113,8 @@ void vm_event_set_registers(struct vcpu *v, vm_event_response_t *rsp) { ASSERT(atomic_read(&v->vm_event_pause_count)); - v->arch.user_regs.rax = rsp->data.regs.x86.rax; - v->arch.user_regs.rbx = rsp->data.regs.x86.rbx; - v->arch.user_regs.rcx = rsp->data.regs.x86.rcx; - v->arch.user_regs.rdx = rsp->data.regs.x86.rdx; - v->arch.user_regs.rsp = rsp->data.regs.x86.rsp; - v->arch.user_regs.rbp = rsp->data.regs.x86.rbp; - v->arch.user_regs.rsi = rsp->data.regs.x86.rsi; - v->arch.user_regs.rdi = rsp->data.regs.x86.rdi; - - v->arch.user_regs.r8 = rsp->data.regs.x86.r8; - v->arch.user_regs.r9 = rsp->data.regs.x86.r9; - v->arch.user_regs.r10 = rsp->data.regs.x86.r10; - v->arch.user_regs.r11 = rsp->data.regs.x86.r11; - v->arch.user_regs.r12 = rsp->data.regs.x86.r12; - v->arch.user_regs.r13 = rsp->data.regs.x86.r13; - v->arch.user_regs.r14 = rsp->data.regs.x86.r14; - v->arch.user_regs.r15 = rsp->data.regs.x86.r15; - - v->arch.user_regs.rflags = rsp->data.regs.x86.rflags; - v->arch.user_regs.rip = rsp->data.regs.x86.rip; + v->arch.vm_event->gprs = rsp->data.regs.x86; + v->arch.vm_event->set_gprs = true; } void vm_event_monitor_next_interrupt(struct vcpu *v) diff --git a/xen/common/vm_event.c b/xen/common/vm_event.c index 0fe9a53..9291db6 100644 --- a/xen/common/vm_event.c +++ b/xen/common/vm_event.c @@ -357,6 +357,16 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved) { vm_event_response_t rsp; + /* + * vm_event_resume() runs in either XEN_DOMCTL_VM_EVENT_OP_*, or + * EVTCHN_send context from the introspection consumer. Both contexts + * are guaranteed not to be the subject of vm_event responses. + * While we could ASSERT(v != current) for each VCPU in d in the loop + * below, this covers the case where we would need to iterate over all + * of them more succintly. + */ + ASSERT(d != current->domain); + /* Pull all responses off the ring. */ while ( vm_event_get_response(d, ved, &rsp) ) { @@ -375,13 +385,6 @@ void vm_event_resume(struct domain *d, struct vm_event_domain *ved) v = d->vcpu[rsp.vcpu_id]; /* - * Make sure the vCPU state has been synchronized for the custom - * handlers. - */ - if ( atomic_read(&v->vm_event_pause_count) ) - sync_vcpu_execstate(v); - - /* * In some cases the response type needs extra handling, so here * we call the appropriate handlers. */ diff --git a/xen/include/asm-x86/vm_event.h b/xen/include/asm-x86/vm_event.h index ca73f99..39e73c8 100644 --- a/xen/include/asm-x86/vm_event.h +++ b/xen/include/asm-x86/vm_event.h @@ -32,6 +32,8 @@ struct arch_vm_event { struct vm_event_emul_insn_data insn; } emul; struct monitor_write_data write_data; + struct vm_event_regs_x86 gprs; + bool set_gprs; }; int vm_event_init_domain(struct domain *d);