diff mbox

[1/2] VMX: fix VMCS race on context-switch paths

Message ID 1487233771.3032.1.camel@citrix.com (mailing list archive)
State New, archived
Headers show

Commit Message

Sergey Dyasli Feb. 16, 2017, 8:29 a.m. UTC
On Wed, 2017-02-15 at 08:15 -0700, Jan Beulich wrote:
> > > > On 15.02.17 at 15:55, <sergey.dyasli@citrix.com> wrote:
> > 
> > On Wed, 2017-02-15 at 06:20 -0700, Jan Beulich wrote:
> > > > > > On 15.02.17 at 11:27, <sergey.dyasli@citrix.com> wrote:
> > > > 
> > > > This is what I'm getting during the original test case (32 VMs reboot):
> > > > 
> > > > (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck!
> > > > (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local  x86_64  debug=n  Not tainted ]----
> > > 
> > > Hmm, this was with a non-debug build, so the ASSERT() in
> > > vmx_vmcs_reload() was a no-op, yet it would have been useful
> > > to know whether active_cpu was -1 when getting stuck here.
> > > Btw - there was no nested virt in the picture in your try, was
> > > there?
> > 
> > No nested virt is involved in the test case.
> > 
> > Is it worth giving your patch another try with removing ctxt_switch_same()
> > since we figured out that vmx_do_resume() will reload vmcs either way?
> 
> Yes, but that's the cosmetic part, whereras ...
> 
> > And I will also update vmx_vmcs_reload() from your last email.
> 
> ... this looks to be the actual bug fix. If you agree with my
> reasoning of removing the loop altogether, you may want to go
> with that version instead of adding the conditional.

After extensive night testing, it can be safe to assume that below
patch fixes the PML issue. I agree about removing the spinning since
vmx_vmcs_enter/exit are synchronized with the scheduler by schedule_lock.
But it costs nothing to check so I added a debug message to the loop.
Needless to say, it was never printed.

My patch for vmx_vmcs_exit() is obviously a half measure because it
doesn't protect against VMCS clearing by an external IPI when current
is idle. I'm not sure such situation is possible but there is nothing
that prevents it.

This clearly makes your approach superior and I think you need to
submit v2 for proper review.


-- 
Thanks,
Sergey

Comments

Jan Beulich Feb. 16, 2017, 9:26 a.m. UTC | #1
>>> On 16.02.17 at 09:29, <sergey.dyasli@citrix.com> wrote:
> On Wed, 2017-02-15 at 08:15 -0700, Jan Beulich wrote:
>> > > > On 15.02.17 at 15:55, <sergey.dyasli@citrix.com> wrote:
>> > Is it worth giving your patch another try with removing ctxt_switch_same()
>> > since we figured out that vmx_do_resume() will reload vmcs either way?
>> 
>> Yes, but that's the cosmetic part, whereras ...
>> 
>> > And I will also update vmx_vmcs_reload() from your last email.
>> 
>> ... this looks to be the actual bug fix. If you agree with my
>> reasoning of removing the loop altogether, you may want to go
>> with that version instead of adding the conditional.
> 
> After extensive night testing, it can be safe to assume that below
> patch fixes the PML issue. I agree about removing the spinning since
> vmx_vmcs_enter/exit are synchronized with the scheduler by schedule_lock.
> But it costs nothing to check so I added a debug message to the loop.
> Needless to say, it was never printed.

Thanks, that's good to know. I'll remove the loop in v2.

> My patch for vmx_vmcs_exit() is obviously a half measure because it
> doesn't protect against VMCS clearing by an external IPI when current
> is idle. I'm not sure such situation is possible but there is nothing
> that prevents it.
> 
> This clearly makes your approach superior and I think you need to
> submit v2 for proper review.

Will do.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 88db7ee..07e8527 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -551,6 +551,33 @@  static void vmx_load_vmcs(struct vcpu *v)
     local_irq_restore(flags);
 }
 
+void vmx_vmcs_reload(struct vcpu *v)
+{
+    /*
+     * As we're running with interrupts disabled, we can't acquire
+     * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled
+     * the VMCS can't be taken away from us anymore if we still own it.
+     */
+    ASSERT(!local_irq_is_enabled());
+    if ( v->arch.hvm_vmx.vmcs == this_cpu(current_vmcs) )
+        return;
+    ASSERT(!this_cpu(current_vmcs));
+
+    if ( v->arch.hvm_vmx.active_cpu != smp_processor_id() )
+    {
+        /*
+         * Wait for the remote side to be done with the VMCS before loading
+         * it here.
+         */
+        while ( v->arch.hvm_vmx.active_cpu != -1 ) {
+            printk("DS: v->arch.hvm_vmx.active_cpu == %d\n",
+                    v->arch.hvm_vmx.active_cpu);
+            cpu_relax();
+        }
+    }
+    vmx_load_vmcs(v);
+}
+
 int vmx_cpu_up_prepare(unsigned int cpu)
 {
     /*
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 8cafec2..ccf433f 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -734,6 +734,18 @@  static void vmx_ctxt_switch_from(struct vcpu *v)
     if ( unlikely(!this_cpu(vmxon)) )
         return;
 
+    if ( !v->is_running )
+    {
+        /*
+         * When this vCPU isn't marked as running anymore, a remote pCPU's
+         * attempt to pause us (from vmx_vmcs_enter()) won't have a reason
+         * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken
+         * away the VMCS from us. As we're running with interrupts disabled,
+         * we also can't call vmx_vmcs_enter().
+         */
+        vmx_vmcs_reload(v);
+    }
+
     vmx_fpu_leave(v);
     vmx_save_guest_msrs(v);
     vmx_restore_host_msrs();
diff --git a/xen/include/asm-x86/hvm/vmx/vmcs.h b/xen/include/asm-x86/hvm/vmx/vmcs.h
index 5974cce..2bf8829 100644
--- a/xen/include/asm-x86/hvm/vmx/vmcs.h
+++ b/xen/include/asm-x86/hvm/vmx/vmcs.h
@@ -157,6 +157,7 @@  void vmx_destroy_vmcs(struct vcpu *v);
 void vmx_vmcs_enter(struct vcpu *v);
 bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v);
 void vmx_vmcs_exit(struct vcpu *v);
+void vmx_vmcs_reload(struct vcpu *v);
 
 #define CPU_BASED_VIRTUAL_INTR_PENDING        0x00000004
 #define CPU_BASED_USE_TSC_OFFSETING           0x00000008