From patchwork Wed Feb 15 10:27:13 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Sergey Dyasli X-Patchwork-Id: 9573787 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 42B2D600F6 for ; Wed, 15 Feb 2017 10:30:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3073328446 for ; Wed, 15 Feb 2017 10:30:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 24CFE28473; Wed, 15 Feb 2017 10:30:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.2 required=2.0 tests=BAYES_00, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 0092E28446 for ; Wed, 15 Feb 2017 10:30:16 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cdwoG-0003w3-Ak; Wed, 15 Feb 2017 10:27:40 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1cdwoF-0003uu-2T for xen-devel@lists.xenproject.org; Wed, 15 Feb 2017 10:27:39 +0000 Received: from [85.158.137.68] by server-4.bemta-3.messagelabs.com id 63/1F-01392-A1D24A85; Wed, 15 Feb 2017 10:27:38 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFupjleJIrShJLcpLzFFi42LZKekooSupuyT CYOIZWYvvWyYzOTB6HP5whSWAMYo1My8pvyKBNeP9TsmC/QUVT45tYmlgbMnvYuTkkBDwk1gw ZRYriM0moCexcfYrJhBbRCBf4tbGB4xdjFwczALdTBJHN+9jA0kICzhLPHr7C6iBA6jIRaLtk yZEvZHE4XeLwUpYBFQlepZNYgSxeQUMJB6duQ0WFxIolrjz/A9YK6eAvcS11fUgYUYBWYkvja uZQWxmAXGJW0/mM0GcJiCxZM95ZghbVOLl43+sELaaxMHed1A1OhJnrz9hhLANJLYu3ccCMp5 ZQFNi/S59CNNSYtYbb4jpihJTuh+yQxwmKHFy5hOWCYxis5AsnoXQPAuheRaS5llImhcwsq5i VC9OLSpLLdI11ksqykzPKMlNzMzRNTQw1stNLS5OTE/NSUwq1kvOz93ECIwnBiDYwdj8xekQo yQHk5IoL9uxxRFCfEn5KZUZicUZ8UWlOanFhxhlODiUJHhztJdECAkWpaanVqRl5gAjGyYtwc GjJMIrqQOU5i0uSMwtzkyHSJ1iVJQS510B0icAksgozYNrgyWTS4yyUsK8jECHCPEUpBblZpa gyr9iFOdgVBLmXQQyhSczrwRu+iugxUxAi1njFoIsLklESEk1MOrt2az0ecom5/uy+7kqvbqv +u08ciHR6uiS5S//ThN9fVO1t53viFGgQtNapRNTj83TKF0ZxXN4yVerjbY57wTXPS/+UvTp4 N4L/idMolzeNzxbdMbqGh/vkaua09RTzkwwSZv+YsmrZKF/mhdWBETPCEhpXLbua8zj9ZXL5f 9VmnofbHjpEBuhxFKckWioxVxUnAgAowjGZiEDAAA= X-Env-Sender: prvs=212368d20=sergey.dyasli@citrix.com X-Msg-Ref: server-12.tower-31.messagelabs.com!1487154456!68891948!1 X-Originating-IP: [185.25.65.24] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG, received_headers: No Received headers X-StarScan-Received: X-StarScan-Version: 9.2.3; banners=-,-,- X-VirusChecked: Checked Received: (qmail 34496 invoked from network); 15 Feb 2017 10:27:37 -0000 Received: from smtp.eu.citrix.com (HELO SMTP.EU.CITRIX.COM) (185.25.65.24) by server-12.tower-31.messagelabs.com with RC4-SHA encrypted SMTP; 15 Feb 2017 10:27:37 -0000 X-IronPort-AV: E=Sophos;i="5.35,165,1484006400"; d="scan'208";a="40802185" From: Sergey Dyasli To: "JBeulich@suse.com" , "xen-devel@lists.xenproject.org" Thread-Topic: [PATCH 1/2] VMX: fix VMCS race on context-switch paths Thread-Index: AQHShq0qOEe6OB/G2kauuqWVv9SXi6FpzgaA Date: Wed, 15 Feb 2017 10:27:13 +0000 Message-ID: <1487154433.3588.1.camel@citrix.com> References: <58A2E8B70200007800139946@prv-mh.provo.novell.com> <58A2E9F9020000780013995D@prv-mh.provo.novell.com> In-Reply-To: <58A2E9F9020000780013995D@prv-mh.provo.novell.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-mailer: Evolution 3.22.3-0ubuntu0.1 x-ms-exchange-messagesentrepresentingtype: 1 x-ms-exchange-transport-fromentityheader: Hosted Content-ID: <76C337918EF87C44BEBB270D21AB9F69@citrix.com> MIME-Version: 1.0 Cc: Sergey Dyasli , Kevin Tian , "Kevin.Mayer@gdata.de" , Andrew Cooper , Anshul Makkar , "jun.nakajima@intel.com" Subject: Re: [Xen-devel] [PATCH 1/2] VMX: fix VMCS race on context-switch paths X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP On Tue, 2017-02-14 at 03:28 -0700, Jan Beulich wrote: > When __context_switch() is being bypassed during original context > switch handling, the vCPU "owning" the VMCS partially loses control of > it: It will appear non-running to remote CPUs, and hence their attempt > to pause the owning vCPU will have no effect on it (as it already > looks to be paused). At the same time the "owning" CPU will re-enable > interrupts eventually (the lastest when entering the idle loop) and > hence becomes subject to IPIs from other CPUs requesting access to the > VMCS. As a result, when __context_switch() finally gets run, the CPU > may no longer have the VMCS loaded, and hence any accesses to it would > fail. Hence we may need to re-load the VMCS in vmx_ctxt_switch_from(). > > Similarly, when __context_switch() is being bypassed also on the second > (switch-in) path, VMCS ownership may have been lost and hence needs > re-establishing. Since there's no existing hook to put this in, add a > new one. > > Reported-by: Kevin Mayer > Reported-by: Anshul Makkar > Signed-off-by: Jan Beulich > > --- a/xen/arch/x86/domain.c > +++ b/xen/arch/x86/domain.c > @@ -2098,11 +2098,14 @@ void context_switch(struct vcpu *prev, s > > set_current(next); > > - if ( (per_cpu(curr_vcpu, cpu) == next) || > - (is_idle_domain(nextd) && cpu_online(cpu)) ) > + if ( (per_cpu(curr_vcpu, cpu) == next) ) > { > + if ( next->arch.ctxt_switch_same ) > + next->arch.ctxt_switch_same(next); > local_irq_enable(); > } > + else if ( is_idle_domain(nextd) && cpu_online(cpu) ) > + local_irq_enable(); > else > { > __context_switch(); > --- a/xen/arch/x86/hvm/vmx/vmcs.c > +++ b/xen/arch/x86/hvm/vmx/vmcs.c > @@ -552,6 +552,27 @@ static void vmx_load_vmcs(struct vcpu *v > local_irq_restore(flags); > } > > +void vmx_vmcs_reload(struct vcpu *v) > +{ > + /* > + * As we're running with interrupts disabled, we can't acquire > + * v->arch.hvm_vmx.vmcs_lock here. However, with interrupts disabled > + * the VMCS can't be taken away from us anymore if we still own it. > + */ > + ASSERT(!local_irq_is_enabled()); > + if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) > + return; > + ASSERT(!this_cpu(current_vmcs)); > + > + /* > + * Wait for the remote side to be done with the VMCS before loading > + * it here. > + */ > + while ( v->arch.hvm_vmx.active_cpu != -1 ) > + cpu_relax(); > + vmx_load_vmcs(v); > +} > + > int vmx_cpu_up_prepare(unsigned int cpu) > { > /* > --- a/xen/arch/x86/hvm/vmx/vmx.c > +++ b/xen/arch/x86/hvm/vmx/vmx.c > @@ -298,6 +298,7 @@ static int vmx_vcpu_initialise(struct vc > v->arch.schedule_tail = vmx_do_resume; > v->arch.ctxt_switch_from = vmx_ctxt_switch_from; > v->arch.ctxt_switch_to = vmx_ctxt_switch_to; > + v->arch.ctxt_switch_same = vmx_vmcs_reload; > > if ( (rc = vmx_create_vmcs(v)) != 0 ) > { > @@ -936,6 +937,18 @@ static void vmx_ctxt_switch_from(struct > if ( unlikely(!this_cpu(vmxon)) ) > return; > > + if ( !v->is_running ) > + { > + /* > + * When this vCPU isn't marked as running anymore, a remote pCPU's > + * attempt to pause us (from vmx_vmcs_enter()) won't have a reason > + * to spin in vcpu_sleep_sync(), and hence that pCPU might have taken > + * away the VMCS from us. As we're running with interrupts disabled, > + * we also can't call vmx_vmcs_enter(). > + */ > + vmx_vmcs_reload(v); > + } > + > vmx_fpu_leave(v); > vmx_save_guest_msrs(v); > vmx_restore_host_msrs(); > --- a/xen/include/asm-x86/domain.h > +++ b/xen/include/asm-x86/domain.h > @@ -514,6 +514,7 @@ struct arch_vcpu > > void (*ctxt_switch_from) (struct vcpu *); > void (*ctxt_switch_to) (struct vcpu *); > + void (*ctxt_switch_same) (struct vcpu *); > > struct vpmu_struct vpmu; > > --- a/xen/include/asm-x86/hvm/vmx/vmcs.h > +++ b/xen/include/asm-x86/hvm/vmx/vmcs.h > @@ -174,6 +174,7 @@ void vmx_destroy_vmcs(struct vcpu *v); > void vmx_vmcs_enter(struct vcpu *v); > bool_t __must_check vmx_vmcs_try_enter(struct vcpu *v); > void vmx_vmcs_exit(struct vcpu *v); > +void vmx_vmcs_reload(struct vcpu *v); > > #define CPU_BASED_VIRTUAL_INTR_PENDING 0x00000004 > #define CPU_BASED_USE_TSC_OFFSETING 0x00000008 > > Using the above patch with the following change for Xen-4.6.1: -    if ( v->arch.hvm_vmx.vmcs_pa == this_cpu(current_vmcs) ) +    if ( v->arch.hvm_vmx.vmcs == this_cpu(current_vmcs) ) This is what I'm getting during the original test case (32 VMs reboot): (XEN) [ 1407.789329] Watchdog timer detects that CPU12 is stuck! (XEN) [ 1407.795726] ----[ Xen-4.6.1-xs-local  x86_64  debug=n  Not tainted ]---- (XEN) [ 1407.803774] CPU:    12 (XEN) [ 1407.806975] RIP:    e008:[] vmx_vmcs_reload+0x32/0x50 (XEN) [ 1407.814926] RFLAGS: 0000000000000013   CONTEXT: hypervisor (d230v0) (XEN) [ 1407.822486] rax: 0000000000000000   rbx: ffff830079ee7000   rcx: 0000000000000000 (XEN) [ 1407.831407] rdx: 0000006f8f72ce00   rsi: ffff8329b3efbfe8   rdi: ffff830079ee7000 (XEN) [ 1407.840326] rbp: ffff83007bab7000   rsp: ffff83400fab7dc8   r8:  000001468e9e3ccc (XEN) [ 1407.849246] r9:  ffff83403ffe7000   r10: 00000146c91c1737   r11: ffff833a9558c310 (XEN) [ 1407.858166] r12: ffff833a9558c000   r13: 000000000000000c   r14: ffff83403ffe7000 (XEN) [ 1407.867085] r15: ffff82d080364640   cr0: 0000000080050033   cr4: 00000000003526e0 (XEN) [ 1407.876005] cr3: 000000294b074000   cr2: 000007fefd7ce150 (XEN) [ 1407.882599] ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008 (XEN) [ 1407.890938] Xen code around (vmx_vmcs_reload+0x32/0x50): (XEN) [ 1407.899277]  84 00 00 00 00 00 f3 90 <83> bf e8 05 00 00 ff 75 f5 e9 a0 fa ff ff f3 c3 (XEN) [ 1407.908679] Xen stack trace from rsp=ffff83400fab7dc8: (XEN) [ 1407.914982]    ffff82d08016c58d 0000000000001000 0000000000000000 0000000000000000 (XEN) [ 1407.923998]    0000000000000206 0000000000000086 0000000000000286 000000000000000c (XEN) [ 1407.933017]    ffff83007bab7058 ffff82d080364640 ffff83007bab7000 00000146a7f26495 (XEN) [ 1407.942032]    ffff830079ee7000 ffff833a9558cf84 ffff833a9558c000 ffff82d080364640 (XEN) [ 1407.951048]    ffff82d08012fb8e ffff83400fabda98 ffff83400faba148 ffff83403ffe7000 (XEN) [ 1407.960067]    ffff83400faba160 ffff83400fabda40 ffff82d080164305 000000000000000c (XEN) [ 1407.969083]    ffff830079ee7000 0000000001c9c380 ffff82d080136400 000000440000011d (XEN) [ 1407.978101]    00000000ffffffff ffffffffffffffff ffff83400fab0000 ffff82d080348d00 (XEN) [ 1407.987116]    ffff833a9558c000 ffff82d080364640 ffff82d08013311c ffff830079ee7000 (XEN) [ 1407.996134]    ffff83400fab0000 ffff830079ee7000 ffff83403ffe7000 00000000ffffffff (XEN) [ 1408.005151]    ffff82d080167d35 ffff83007bab7000 0000000000000001 fffffa80077f9700 (XEN) [ 1408.014167]    fffffa80075bf900 fffffa80077f9820 0000000000000000 0000000000000000 (XEN) [ 1408.023184]    fffffa8008889c00 0000000002fa1e78 0000003b6ed18d78 0000000000000000 (XEN) [ 1408.032202]    00000000068e7780 fffffa80075ba790 fffffa80077f9848 fffff800027f9e80 (XEN) [ 1408.041220]    0000000000000001 000000fc00000000 fffff880042499c2 0000000000000000 (XEN) [ 1408.050235]    0000000000000246 fffff80000b9cb58 0000000000000000 80248e00e008e1f0 (XEN) [ 1408.059253]    00000000ffff82d0 80248e00e008e200 00000000ffff82d0 80248e000000000c (XEN) [ 1408.068268]    ffff830079ee7000 0000006f8f72ce00 00000000ffff82d0 (XEN) [ 1408.075638] Xen call trace: (XEN) [ 1408.079322]    [] vmx_vmcs_reload+0x32/0x50 (XEN) [ 1408.086303]    [] context_switch+0x85d/0xeb0 (XEN) [ 1408.093380]    [] schedule.c#schedule+0x46e/0x7d0 (XEN) [ 1408.100942]    [] reprogram_timer+0x75/0xe0 (XEN) [ 1408.107925]    [] timer.c#timer_softirq_action+0x90/0x210 (XEN) [ 1408.116263]    [] softirq.c#__do_softirq+0x5c/0x90 (XEN) [ 1408.123921]    [] domain.c#idle_loop+0x25/0x60 Currently I'm testing the following patch that fixes at least one possible scenario: commit f76dc832c2de4950872fc27962c56c609cff1160 Author: Sergey Dyasli Date:   Tue Feb 14 15:23:54 2017 +0000     x86/vmx: use curr_vcpu in vmx_vmcs_exit()          During context_switch() from a HVM vCPU to the idle vCPU, current is     updated but curr_vcpu retains the HMV vCPU's value.  If after that,     for some reason, vmx_vmcs_enter() + vmx_vmcs_exit() pair will be     executed, the test for has_hvm_container_vcpu(current) will fail and     vmcs for curr_vcpu will not be loaded.          This will lead to BUG() during the next __context_switch() when     vmx_ctxt_switch_from() will be executed and __vmwrite() will fail.          Fix this by testing has_hvm_container_vcpu() against curr_vcpu.          Signed-off-by: Sergey Dyasli           So far no crashes observed but testing continues. --- Thanks, Sergey diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c index 88db7ee..f0d71ae 100644 --- a/xen/arch/x86/hvm/vmx/vmcs.c +++ b/xen/arch/x86/hvm/vmx/vmcs.c @@ -742,6 +742,8 @@ void vmx_vmcs_enter(struct vcpu *v)  void vmx_vmcs_exit(struct vcpu *v)  {      struct foreign_vmcs *fv; +    unsigned int cpu = smp_processor_id(); +    struct vcpu *p = per_cpu(curr_vcpu, cpu);        if ( likely(v == current) )          return; @@ -754,8 +756,8 @@ void vmx_vmcs_exit(struct vcpu *v)      {          /* Don't confuse vmx_do_resume (for @v or @current!) */          vmx_clear_vmcs(v); -        if ( has_hvm_container_vcpu(current) ) -            vmx_load_vmcs(current); +        if ( has_hvm_container_vcpu(p) ) +            vmx_load_vmcs(p);            spin_unlock(&v->arch.hvm_vmx.vmcs_lock);          vcpu_unpause(v);