Message ID | 20160115213958.GA16118@char.us.oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
>>> On 15.01.16 at 22:39, <konrad.wilk@oracle.com> wrote: > On Tue, Jan 12, 2016 at 02:22:03AM -0700, Jan Beulich wrote: >> Since we can (I hope) pretty much exclude a paging type, the >> ASSERT() must have triggered because of vapic_pg being NULL. >> That might be verifiable without extra printk()s, just by checking >> the disassembly (assuming the value sits in a register). In which >> case vapic_gpfn would be of interest too. > > The vapic_gpfn is 0xffffffffffff. > > To be exact: > > nvmx_update_virtual_apic_address:vCPU0 0xffffffffffffffff(vAPIC) 0x0(APIC), 0x0(TPR) ctrl=b5b9effe > > Based on this: > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > index cb6f9b8..8a0abfc 100644 > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -695,7 +695,15 @@ static void nvmx_update_virtual_apic_address(struct vcpu *v) > > vapic_gpfn = __get_vvmcs(nvcpu->nv_vvmcx, VIRTUAL_APIC_PAGE_ADDR) >> PAGE_SHIFT; > vapic_pg = get_page_from_gfn(v->domain, vapic_gpfn, &p2mt, P2M_ALLOC); > - ASSERT(vapic_pg && !p2m_is_paging(p2mt)); > + if ( !vapic_pg ) { > + printk("%s:vCPU%d 0x%lx(vAPIC) 0x%lx(APIC), 0x%lx(TPR) ctrl=%x\n", __func__,v->vcpu_id, > + __get_vvmcs(nvcpu->nv_vvmcx, VIRTUAL_APIC_PAGE_ADDR), > + __get_vvmcs(nvcpu->nv_vvmcx, APIC_ACCESS_ADDR), > + __get_vvmcs(nvcpu->nv_vvmcx, TPR_THRESHOLD), > + ctrl); > + } > + ASSERT(vapic_pg); > + ASSERT(vapic_pg && !p2m_is_paging(p2mt)); > __vmwrite(VIRTUAL_APIC_PAGE_ADDR, page_to_maddr(vapic_pg)); > put_page(vapic_pg); > } Interesting: I can't see VIRTUAL_APIC_PAGE_ADDR to be written with all ones anywhere, neither for the real VMCS nor for the virtual one (page_to_maddr() can't, afaict, return such a value). Could you check where the L1 guest itself is writing that value, or whether it fails to initialize that field and it happens to start out as all ones? >> What looks odd to me is the connection between >> CPU_BASED_TPR_SHADOW being set and the use of a (valid) >> virtual APIC page: Wouldn't this rather need to depend on >> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, just like in >> nvmx_update_apic_access_address()? > > Could be. I added in an read for the secondary control: > > nvmx_update_virtual_apic_address:vCPU2 0xffffffffffffffff(vAPIC) 0x0(APIC), > 0x0(TPR) ctrl=b5b9effe sec=0 > > So trying your recommendation: > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > index cb6f9b8..d291c91 100644 > --- a/xen/arch/x86/hvm/vmx/vvmx.c > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > @@ -686,8 +686,8 @@ static void nvmx_update_virtual_apic_address(struct vcpu *v) > struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); > u32 ctrl; > > - ctrl = __n2_exec_control(v); > - if ( ctrl & CPU_BASED_TPR_SHADOW ) > + ctrl = __n2_secondary_exec_control(v); > + if ( ctrl & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES ) > { > p2m_type_t p2mt; > unsigned long vapic_gpfn; > > > Got me: > (XEN) stdvga.c:151:d1v0 leaving stdvga mode > (XEN) stdvga.c:147:d1v0 entering stdvga and caching modes > (XEN) stdvga.c:520:d1v0 leaving caching mode > (XEN) vvmx.c:2491:d1v0 Unknown nested vmexit reason 80000021. > (XEN) Failed vm entry (exit reason 0x80000021) caused by invalid guest state Interesting. I've just noticed that a similar odd looking (to me) dependency exists in construct_vmcs(). Perhaps I've overlooked something in the SDM. In any event I think some words from the VMX maintainers would be quite nice here. Sadly the VMCS dump doesn't include the two APIC related addresses... Jan
On Mon, Jan 18, 2016 at 02:41:52AM -0700, Jan Beulich wrote: > >>> On 15.01.16 at 22:39, <konrad.wilk@oracle.com> wrote: > > On Tue, Jan 12, 2016 at 02:22:03AM -0700, Jan Beulich wrote: > >> Since we can (I hope) pretty much exclude a paging type, the > >> ASSERT() must have triggered because of vapic_pg being NULL. > >> That might be verifiable without extra printk()s, just by checking > >> the disassembly (assuming the value sits in a register). In which > >> case vapic_gpfn would be of interest too. > > > > The vapic_gpfn is 0xffffffffffff. > > > > To be exact: > > > > nvmx_update_virtual_apic_address:vCPU0 0xffffffffffffffff(vAPIC) 0x0(APIC), 0x0(TPR) ctrl=b5b9effe > > > > Based on this: > > > > diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c > > index cb6f9b8..8a0abfc 100644 > > --- a/xen/arch/x86/hvm/vmx/vvmx.c > > +++ b/xen/arch/x86/hvm/vmx/vvmx.c > > @@ -695,7 +695,15 @@ static void nvmx_update_virtual_apic_address(struct vcpu *v) > > > > vapic_gpfn = __get_vvmcs(nvcpu->nv_vvmcx, VIRTUAL_APIC_PAGE_ADDR) >> PAGE_SHIFT; > > vapic_pg = get_page_from_gfn(v->domain, vapic_gpfn, &p2mt, P2M_ALLOC); > > - ASSERT(vapic_pg && !p2m_is_paging(p2mt)); > > + if ( !vapic_pg ) { > > + printk("%s:vCPU%d 0x%lx(vAPIC) 0x%lx(APIC), 0x%lx(TPR) ctrl=%x\n", __func__,v->vcpu_id, > > + __get_vvmcs(nvcpu->nv_vvmcx, VIRTUAL_APIC_PAGE_ADDR), > > + __get_vvmcs(nvcpu->nv_vvmcx, APIC_ACCESS_ADDR), > > + __get_vvmcs(nvcpu->nv_vvmcx, TPR_THRESHOLD), > > + ctrl); > > + } > > + ASSERT(vapic_pg); > > + ASSERT(vapic_pg && !p2m_is_paging(p2mt)); > > __vmwrite(VIRTUAL_APIC_PAGE_ADDR, page_to_maddr(vapic_pg)); > > put_page(vapic_pg); > > } > > Interesting: I can't see VIRTUAL_APIC_PAGE_ADDR to be written > with all ones anywhere, neither for the real VMCS nor for the virtual > one (page_to_maddr() can't, afaict, return such a value). Could you > check where the L1 guest itself is writing that value, or whether it > fails to initialize that field and it happens to start out as all ones? This is getting more and more bizzare. I realized that this machine has VMCS shadowing so Xen does not trap on any vmwrite or vmread. Unless I update the VMCS shadowing bitmap - which I did for vmwrite and vmread to get a better view of this. It never traps on VIRTUAL_APIC_PAGE_ADDR accesses. It does trap on: VIRTUAL_PROCESSOR_ID, VM_EXIT_MSR_LOAD_ADDR and GUEST_[ES,DS,FS,GS,TR]_SELECTORS. (It may also trap on IO_BITMAP_A,B but I didn't print that out). To confirm that the VMCS that will be given to the L2 guest is correct I added some printking of some states that ought to be pretty OK such as HOST_RIP or HOST_RSP - which are all 0! If I let the nvmx_update_virtual_apic_address keep on going without modifying the VIRTUAL_APIC_PAGE_ADDR it later on crashes the nested guest: EN) nvmx_handle_vmwrite: 0 (XEN) nvmx_handle_vmwrite: 0 (XEN) nvmx_handle_vmwrite: 2008 (XEN) nvmx_handle_vmwrite: 2008 (XEN) nvmx_handle_vmwrite: 0 (XEN) nvmx_handle_vmwrite: 2008 (XEN) nvmx_handle_vmwrite: 0 (XEN) nvmx_handle_vmwrite: 2008 (XEN) nvmx_handle_vmwrite: 2008 (XEN) nvmx_handle_vmwrite: 2008 (XEN) nvmx_handle_vmwrite: 2008 (XEN) nvmx_handle_vmwrite: 2008 (XEN) nvmx_handle_vmwrite: 800 (XEN) nvmx_handle_vmwrite: 804 (XEN) nvmx_handle_vmwrite: 806 (XEN) nvmx_handle_vmwrite: 80a (XEN) nvmx_handle_vmwrite: 80e (XEN) nvmx_update_virtual_apic_address: vCPU1 0xffffffffffffffff(vAPIC) 0x0(APIC), 0x0(TPR) ctrl=b5b9effe sec=0 (XEN) nvmx_update_virtual_apic_address: TPR threshold = 0x0 updated 0. (XEN) nvmx_update_virtual_apic_address: Virtual APIC = 0x0 updated 0. (XEN) nvmx_update_virtual_apic_address: APIC address = 0x0 updated 0. (XEN) HOST_RIP=0x0 HOST_RSP=0x0 (XEN) <vm_launch_fail> error code 7 (XEN) domain_crash_sync called from vmcs.c:1597 (XEN) Domain 1 (vcpu#1) crashed on cpu#37: (XEN) ----[ Xen-4.6.0 x86_64 debug=n Tainted: C ]---- (XEN) CPU: 37 (XEN) RIP: 0000:[<0000000000000000>] (XEN) RFLAGS: 0000000000000000 CONTEXT: hvm guest (d1v1) (XEN) rax: ffff82d08010648b rbx: ffff8340007fb000 rcx: 0000000000000000 (XEN) rdx: ffff82d0801ddf5f rsi: 0000000000000000 rdi: ffff82d0801ebd6a (XEN) rbp: ffff82d08018cb09 rsp: 0000000000000000 r8: 0000000000000000 (XEN) r9: ffff834007980000 r10: 000000000000063d r11: ffff82d080106465 (XEN) r12: 0000000000000000 r13: 0000000000000000 r14: 0000000000000000 (XEN) r15: ffff834007980000 cr0: 0000000000000010 cr4: 0000000000000000 (XEN) cr3: 00000000efd06000 cr2: 0000000000000000 (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: 0000 cs: 0000 which should be no surprise as the VMCS is corrupt. I need to do some more double-checking to see how it is possible for this VMCS to get some messed up. And of course if I run an Xen under Xen with an HVM guests - it works fine.
>>> On 02.02.16 at 23:05, <konrad.wilk@oracle.com> wrote: > This is getting more and more bizzare. > > I realized that this machine has VMCS shadowing so Xen does not trap on > any vmwrite or vmread. Unless I update the VMCS shadowing bitmap - which > I did for vmwrite and vmread to get a better view of this. It never > traps on VIRTUAL_APIC_PAGE_ADDR accesses. It does trap on: > VIRTUAL_PROCESSOR_ID, > VM_EXIT_MSR_LOAD_ADDR and GUEST_[ES,DS,FS,GS,TR]_SELECTORS. > > (It may also trap on IO_BITMAP_A,B but I didn't print that out). > > To confirm that the VMCS that will be given to the L2 guest is correct > I added some printking of some states that ought to be pretty OK such > as HOST_RIP or HOST_RSP - which are all 0! But did you also check what the field of interest starts out as? > If I let the nvmx_update_virtual_apic_address keep on going without > modifying the VIRTUAL_APIC_PAGE_ADDR it later on crashes the nested > guest: > > EN) nvmx_handle_vmwrite: 0 The missing characters at the beginning may just be a copy-and- paste mistake, but they could also indicate a truncated log. Can you clarify which of the two it is? > (XEN) nvmx_handle_vmwrite: 0 > (XEN) nvmx_handle_vmwrite: 2008 > (XEN) nvmx_handle_vmwrite: 2008 > (XEN) nvmx_handle_vmwrite: 0 > (XEN) nvmx_handle_vmwrite: 2008 > (XEN) nvmx_handle_vmwrite: 0 > (XEN) nvmx_handle_vmwrite: 2008 > (XEN) nvmx_handle_vmwrite: 2008 > (XEN) nvmx_handle_vmwrite: 2008 > (XEN) nvmx_handle_vmwrite: 2008 > (XEN) nvmx_handle_vmwrite: 2008 > (XEN) nvmx_handle_vmwrite: 800 > (XEN) nvmx_handle_vmwrite: 804 > (XEN) nvmx_handle_vmwrite: 806 > (XEN) nvmx_handle_vmwrite: 80a > (XEN) nvmx_handle_vmwrite: 80e > (XEN) nvmx_update_virtual_apic_address: vCPU1 0xffffffffffffffff(vAPIC) 0x0(APIC), 0x0(TPR) ctrl=b5b9effe sec=0 Assuming the field starts out as other than all ones, could you check its value on each of the intercepted VMWRITEs, to at least narrow when it changes. Kevin, Jun - are there any cases where the hardware would alter this field's value? Like during some guest side LAPIC manipulations? (The same monitoring as suggested during VMWRITEs could of course also be added to LAPIC accesses visible to the hypervisor, but I guess there won't be too many of those.) Jan
On Wed, Feb 03, 2016 at 02:34:47AM -0700, Jan Beulich wrote: > >>> On 02.02.16 at 23:05, <konrad.wilk@oracle.com> wrote: > > This is getting more and more bizzare. > > > > I realized that this machine has VMCS shadowing so Xen does not trap on > > any vmwrite or vmread. Unless I update the VMCS shadowing bitmap - which > > I did for vmwrite and vmread to get a better view of this. It never > > traps on VIRTUAL_APIC_PAGE_ADDR accesses. It does trap on: > > VIRTUAL_PROCESSOR_ID, > > VM_EXIT_MSR_LOAD_ADDR and GUEST_[ES,DS,FS,GS,TR]_SELECTORS. > > > > (It may also trap on IO_BITMAP_A,B but I didn't print that out). > > > > To confirm that the VMCS that will be given to the L2 guest is correct > > I added some printking of some states that ought to be pretty OK such > > as HOST_RIP or HOST_RSP - which are all 0! > > But did you also check what the field of interest starts out as? I will do that. > > > If I let the nvmx_update_virtual_apic_address keep on going without > > modifying the VIRTUAL_APIC_PAGE_ADDR it later on crashes the nested > > guest: > > > > EN) nvmx_handle_vmwrite: 0 > > The missing characters at the beginning may just be a copy-and- > paste mistake, but they could also indicate a truncated log. Can > you clarify which of the two it is? Just an copy-n-paste error. Nothing of interest before there: (d1) NULL (d1) Booting from Hard Disk... (d1) Booting from 0000:7c00 (XEN) nvmx_handle_vmwrite: 0 (XEN) nvmx_handle_vmwrite: 0 .. > > > (XEN) nvmx_handle_vmwrite: 0 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 0 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 0 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 800 > > (XEN) nvmx_handle_vmwrite: 804 > > (XEN) nvmx_handle_vmwrite: 806 > > (XEN) nvmx_handle_vmwrite: 80a > > (XEN) nvmx_handle_vmwrite: 80e > > (XEN) nvmx_update_virtual_apic_address: vCPU1 0xffffffffffffffff(vAPIC) 0x0(APIC), 0x0(TPR) ctrl=b5b9effe sec=0 > > Assuming the field starts out as other than all ones, could you check > its value on each of the intercepted VMWRITEs, to at least narrow > when it changes. Yes of course. > > Kevin, Jun - are there any cases where the hardware would alter > this field's value? Like during some guest side LAPIC manipulations? > (The same monitoring as suggested during VMWRITEs could of > course also be added to LAPIC accesses visible to the hypervisor, > but I guess there won't be too many of those.) > > Jan >
> From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: Wednesday, February 03, 2016 5:35 PM > > (XEN) nvmx_handle_vmwrite: 0 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 0 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 0 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 2008 > > (XEN) nvmx_handle_vmwrite: 800 > > (XEN) nvmx_handle_vmwrite: 804 > > (XEN) nvmx_handle_vmwrite: 806 > > (XEN) nvmx_handle_vmwrite: 80a > > (XEN) nvmx_handle_vmwrite: 80e > > (XEN) nvmx_update_virtual_apic_address: vCPU1 0xffffffffffffffff(vAPIC) 0x0(APIC), > 0x0(TPR) ctrl=b5b9effe sec=0 > > Assuming the field starts out as other than all ones, could you check > its value on each of the intercepted VMWRITEs, to at least narrow > when it changes. > > Kevin, Jun - are there any cases where the hardware would alter > this field's value? Like during some guest side LAPIC manipulations? > (The same monitoring as suggested during VMWRITEs could of > course also be added to LAPIC accesses visible to the hypervisor, > but I guess there won't be too many of those.) > No such case in my knowledge. But let me confirm with hardware team. Thanks Kevin
> From: Tian, Kevin > Sent: Thursday, February 04, 2016 1:52 PM > > > From: Jan Beulich [mailto:JBeulich@suse.com] > > Sent: Wednesday, February 03, 2016 5:35 PM > > > (XEN) nvmx_handle_vmwrite: 0 > > > (XEN) nvmx_handle_vmwrite: 2008 > > > (XEN) nvmx_handle_vmwrite: 2008 > > > (XEN) nvmx_handle_vmwrite: 0 > > > (XEN) nvmx_handle_vmwrite: 2008 > > > (XEN) nvmx_handle_vmwrite: 0 > > > (XEN) nvmx_handle_vmwrite: 2008 > > > (XEN) nvmx_handle_vmwrite: 2008 > > > (XEN) nvmx_handle_vmwrite: 2008 > > > (XEN) nvmx_handle_vmwrite: 2008 > > > (XEN) nvmx_handle_vmwrite: 2008 > > > (XEN) nvmx_handle_vmwrite: 800 > > > (XEN) nvmx_handle_vmwrite: 804 > > > (XEN) nvmx_handle_vmwrite: 806 > > > (XEN) nvmx_handle_vmwrite: 80a > > > (XEN) nvmx_handle_vmwrite: 80e > > > (XEN) nvmx_update_virtual_apic_address: vCPU1 0xffffffffffffffff(vAPIC) > 0x0(APIC), > > 0x0(TPR) ctrl=b5b9effe sec=0 > > > > Assuming the field starts out as other than all ones, could you check > > its value on each of the intercepted VMWRITEs, to at least narrow > > when it changes. > > > > Kevin, Jun - are there any cases where the hardware would alter > > this field's value? Like during some guest side LAPIC manipulations? > > (The same monitoring as suggested during VMWRITEs could of > > course also be added to LAPIC accesses visible to the hypervisor, > > but I guess there won't be too many of those.) > > > > No such case in my knowledge. But let me confirm with hardware team. > Confirmed no such case. Thanks Kevin
diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index cb6f9b8..8a0abfc 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -695,7 +695,15 @@ static void nvmx_update_virtual_apic_address(struct vcpu *v) vapic_gpfn = __get_vvmcs(nvcpu->nv_vvmcx, VIRTUAL_APIC_PAGE_ADDR) >> PAGE_SHIFT; vapic_pg = get_page_from_gfn(v->domain, vapic_gpfn, &p2mt, P2M_ALLOC); - ASSERT(vapic_pg && !p2m_is_paging(p2mt)); + if ( !vapic_pg ) { + printk("%s:vCPU%d 0x%lx(vAPIC) 0x%lx(APIC), 0x%lx(TPR) ctrl=%x\n", __func__,v->vcpu_id, + __get_vvmcs(nvcpu->nv_vvmcx, VIRTUAL_APIC_PAGE_ADDR), + __get_vvmcs(nvcpu->nv_vvmcx, APIC_ACCESS_ADDR), + __get_vvmcs(nvcpu->nv_vvmcx, TPR_THRESHOLD), + ctrl); + } + ASSERT(vapic_pg); + ASSERT(vapic_pg && !p2m_is_paging(p2mt)); __vmwrite(VIRTUAL_APIC_PAGE_ADDR, page_to_maddr(vapic_pg)); put_page(vapic_pg); } > > What looks odd to me is the connection between > CPU_BASED_TPR_SHADOW being set and the use of a (valid) > virtual APIC page: Wouldn't this rather need to depend on > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES, just like in > nvmx_update_apic_access_address()? Could be. I added in an read for the secondary control: nvmx_update_virtual_apic_address:vCPU2 0xffffffffffffffff(vAPIC) 0x0(APIC), 0x0(TPR) ctrl=b5b9effe sec=0 So trying your recommendation: diff --git a/xen/arch/x86/hvm/vmx/vvmx.c b/xen/arch/x86/hvm/vmx/vvmx.c index cb6f9b8..d291c91 100644 --- a/xen/arch/x86/hvm/vmx/vvmx.c +++ b/xen/arch/x86/hvm/vmx/vvmx.c @@ -686,8 +686,8 @@ static void nvmx_update_virtual_apic_address(struct vcpu *v) struct nestedvcpu *nvcpu = &vcpu_nestedhvm(v); u32 ctrl; - ctrl = __n2_exec_control(v); - if ( ctrl & CPU_BASED_TPR_SHADOW ) + ctrl = __n2_secondary_exec_control(v); + if ( ctrl & SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES ) { p2m_type_t p2mt; unsigned long vapic_gpfn;