Message ID | 1483163161-2402-6-git-send-email-suravee.suthikulpanit@amd.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 31/12/2016 05:45, Suravee Suthikulpanit wrote: > diff --git a/xen/arch/x86/hvm/svm/avic.c b/xen/arch/x86/hvm/svm/avic.c > new file mode 100644 > index 0000000..b62f982 > --- /dev/null > +++ b/xen/arch/x86/hvm/svm/avic.c > @@ -0,0 +1,232 @@ > +/* > + * avic.c: implements AMD Advance Virtual Interrupt Controller (AVIC) support > + * Copyright (c) 2016, Advanced Micro Devices, Inc. > + * > + * This program is free software; you can redistribute it and/or modify it > + * under the terms and conditions of the GNU General Public License, > + * version 2, as published by the Free Software Foundation. > + * > + * This program is distributed in the hope it will be useful, but WITHOUT > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or > + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for > + * more details. > + * > + * You should have received a copy of the GNU General Public License along with > + * this program; If not, see <http://www.gnu.org/licenses/>. > + */ > + > +#include <xen/domain_page.h> > +#include <xen/sched.h> > +#include <xen/stdbool.h> > +#include <asm/acpi.h> > +#include <asm/apicdef.h> > +#include <asm/event.h> > +#include <asm/hvm/emulate.h> > +#include <asm/hvm/nestedhvm.h> > +#include <asm/hvm/support.h> > +#include <asm/hvm/svm/avic.h> > +#include <asm/hvm/vlapic.h> > +#include <asm/p2m.h> > +#include <asm/page.h> > + > +/* > + * Note: Current max index allowed for physical APIC ID table is 255. > + */ > +#define AVIC_PHY_APIC_ID_MAX 0xFF > + > +#define AVIC_DOORBELL 0xc001011b > + > +#define AVIC_HPA_SHIFT 12 > +#define AVIC_HPA_MASK (((1ULL << 40) - 1) << AVIC_HPA_SHIFT) What does this mask represent? I have spotted a truncation bug below, but how to fix the issue depends on the meaning of the mask. The only limits I can see in the spec is that the host physical addresses must be present in system RAM, which presumably means they should follow maxphysaddr like everything else? > +#define AVIC_VAPIC_BAR_MASK AVIC_HPA_MASK > + > +/* > + * Note: > + * Currently, svm-avic mode is not supported with nested virtualization. > + * Therefore, it is not yet currently enabled by default. Once the support > + * is in-place, this should be enabled by default. > + */ > +bool svm_avic = 0; > +boolean_param("svm-avic", svm_avic); We are trying to avoid the proliferation of top-level options. Please could you introduce a "svm=" option which takes avic as a sub-boolean, similar to how iommu= currently works? > + > +static struct avic_phy_apic_id_ent * > +avic_get_phy_apic_id_ent(const struct vcpu *v, unsigned int index) The Physical APIC table is per-domain, not per-cpu according to the spec. This function should take a struct domain, although I don't think you need it at all (see below). > +{ > + struct avic_phy_apic_id_ent *avic_phy_apic_id_table; > + struct svm_domain *d = &v->domain->arch.hvm_domain.svm; > + > + if ( !d->avic_phy_apic_id_table_mfn ) > + return NULL; > + > + /* > + * Note: APIC ID = 0xff is used for broadcast. > + * APIC ID > 0xff is reserved. > + */ > + if ( index >= 0xff ) > + return NULL; > + > + avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn); This is unsafe and will break on hosts with more than 5TB of RAM. As you allocate avic_phy_apic_id_table_mfn with alloc_domheap_page(), you must use {map,unmap}_domain_page() to get temporary mappings (which will use mfn_to_virt() in the common case), or use {map,unmap}_domain_page_global() to get permanent mappings. As the values in here need modifying across a schedule, I would suggest making a global mapping at create time, and storing the result in a properly-typed pointer. > + > + return &avic_phy_apic_id_table[index]; > +} > + > +int svm_avic_dom_init(struct domain *d) > +{ > + int ret = 0; > + struct page_info *pg; > + unsigned long mfn; mfn_t please, which would also highlight the type error in svm_domain. > + > + if ( !svm_avic ) > + return 0; || !has_vlapic(d) HVMLite domains may legitimately be configured without an LAPIC at all. > + > + /* > + * Note: > + * AVIC hardware walks the nested page table to check permissions, > + * but does not use the SPA address specified in the leaf page > + * table entry since it uses address in the AVIC_BACKING_PAGE pointer > + * field of the VMCB. Therefore, we set up a dummy page for APIC _mfn(0). > + */ > + if ( !d->arch.hvm_domain.svm.avic_access_page_done ) > + { > + set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE), > + _mfn(0), PAGE_ORDER_4K, > + p2m_get_hostp2m(d)->default_access); > + d->arch.hvm_domain.svm.avic_access_page_done = true; I don't see the purpose of this avic_access_page_done boolean, even in later patches. > + } > + > + /* Init AVIC logical APIC ID table */ > + pg = alloc_domheap_page(d, MEMF_no_owner); > + if ( !pg ) > + { > + gdprintk(XENLOG_ERR, > + "%d: AVIC logical APIC ID table could not be allocated.\n", > + d->domain_id); Do we really need to have a printk() in the -ENOMEM case? I'd personally not bother. > + ret = -ENOMEM; > + goto err_out; > + } > + mfn = page_to_mfn(pg); > + clear_domain_page(_mfn(mfn)); > + d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn = mfn; > + > + /* Init AVIC physical APIC ID table */ > + pg = alloc_domheap_page(d, MEMF_no_owner); > + if ( !pg ) > + { > + gdprintk(XENLOG_ERR, > + "%d: AVIC physical APIC ID table could not be allocated.\n", > + d->domain_id); > + ret = -ENOMEM; > + goto err_out; > + } > + mfn = page_to_mfn(pg); > + clear_domain_page(_mfn(mfn)); > + d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn = mfn; > + > + return ret; > + err_out: > + svm_avic_dom_destroy(d); > + return ret; > +} > + > +void svm_avic_dom_destroy(struct domain *d) > +{ > + if ( !svm_avic ) > + return; > + > + if ( d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn ) > + { > + free_domheap_page(mfn_to_page(d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn)); > + d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn = 0; > + } > + > + if ( d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn ) > + { > + free_domheap_page(mfn_to_page(d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn)); > + d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn = 0; > + } > +} > + > +bool svm_avic_vcpu_enabled(const struct vcpu *v) > +{ > + const struct arch_svm_struct *s = &v->arch.hvm_svm; > + const struct vmcb_struct *vmcb = s->vmcb; > + > + return svm_avic && vmcb->_vintr.fields.avic_enable; This predicate should just be on avic_enable. The svm_avic case should prevent the avic_enable from being set in the first place. > +} > + > +static inline u32 * > +avic_get_bk_page_entry(const struct vcpu *v, u32 offset) > +{ > + const struct vlapic *vlapic = vcpu_vlapic(v); > + char *tmp; > + > + if ( !vlapic || !vlapic->regs_page ) > + return NULL; > + > + tmp = (char *)page_to_virt(vlapic->regs_page); > + return (u32 *)(tmp + offset); This is also unsafe over 5TB. vlapic->regs is already a global mapping which you should use. Also, you should leave a comment by the allocation of regs_page that AVIC depends on it being a full page allocation. > +} > + > +void svm_avic_update_vapic_bar(const struct vcpu *v, uint64_t data) > +{ > + const struct arch_svm_struct *s = &v->arch.hvm_svm; > + > + s->vmcb->avic_vapic_bar = data & AVIC_VAPIC_BAR_MASK; > + s->vmcb->cleanbits.fields.avic = 0; While I can appreciate what you are trying to do here, altering the physical address in IA32_APICBASE has never functioned properly under Xen, as nothing updates the P2M. Worse, with multi-vcpu guests, the use of a single shared p2m makes this impossible to do properly. I know it is non-architectural behaviour, but IMO vlapic_msr_set() should be updated to raise #GP[0] when attempting to change the frame, to make it fail obviously rather than having interrupts go missing. Also I see that the Intel side (via vmx_vlapic_msr_changed()) makes the same mistake. > +} > + > +int svm_avic_init_vmcb(struct vcpu *v) > +{ > + paddr_t ma; > + u32 *apic_id_reg; > + struct arch_svm_struct *s = &v->arch.hvm_svm; > + struct vmcb_struct *vmcb = s->vmcb; > + struct svm_domain *d = &v->domain->arch.hvm_domain.svm; > + const struct vlapic *vlapic = vcpu_vlapic(v); > + struct avic_phy_apic_id_ent entry; > + > + if ( !svm_avic ) > + return 0; > + > + if ( !vlapic || !vlapic->regs_page ) > + return -EINVAL; Somewhere you need a bounds check against the APIC ID being within AVIC limits. > + > + apic_id_reg = avic_get_bk_page_entry(v, APIC_ID); > + if ( !apic_id_reg ) > + return -EINVAL; This should be vlapic_read()... > + > + s->avic_last_phy_id = avic_get_phy_apic_id_ent(v, *apic_id_reg >> 24); And this, GET_xAPIC_ID(), > + if ( !s->avic_last_phy_id ) > + return -EINVAL; Why does a pointer into the physical ID page need storing in arch_svm_struct? > + > + vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & AVIC_HPA_MASK; This use of AVIC_HPA_MASK may truncate the the address, which is definitely a problem. If AVIC_HPA_MASK isn't just related to maxphysaddr, then you need to pass appropriate memflags into the alloc_domheap_page() calls to get a suitable page. > + ma = d->avic_log_apic_id_table_mfn; > + vmcb->avic_log_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK; This ma << PAGE_SHIFT also highlights the type error. ma != mfn. > + ma = d->avic_phy_apic_id_table_mfn; > + vmcb->avic_phy_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK; > + vmcb->avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX; Surely this should be some calculation derived from d->max_vcpus? > + > + entry = *(s->avic_last_phy_id); > + smp_rmb(); > + entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> AVIC_HPA_SHIFT; > + entry.is_running = 0; > + entry.valid = 1; > + *(s->avic_last_phy_id) = entry; > + smp_wmb(); During domain creation, no guests are running, so you can edit this cleanly. What values are actually being excluded here? This, and other patches, look like you are actually just trying to do a read_atomic(), modify, write_atomic() update, rather than actually requiring ordering. > + > + svm_avic_update_vapic_bar(v, APIC_DEFAULT_PHYS_BASE); > + > + vmcb->_vintr.fields.avic_enable = 1; > + > + return 0; > +} > + > +/* > + * Local variables: > + * mode: C > + * c-file-style: "BSD" > + * c-basic-offset: 4 > + * tab-width: 4 > + * indent-tabs-mode: nil > + * End: > + */ > @@ -1799,6 +1802,10 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content) > > switch ( msr ) > { > + case MSR_IA32_APICBASE: > + if ( svm_avic_vcpu_enabled(v) ) > + svm_avic_update_vapic_bar(v, msr_content); > + break; I think this is dead code. MSR_IA32_APICBASE is handled in hvm_msr_write_intercept(), and won't fall into the vendor hook. If you were to continue down this route, I would add another pi_ops hook, and replace the VMX layering violation in vlapic_msr_set(). > case MSR_IA32_SYSENTER_CS: > case MSR_IA32_SYSENTER_ESP: > case MSR_IA32_SYSENTER_EIP: > diff --git a/xen/include/asm-x86/hvm/svm/avic.h b/xen/include/asm-x86/hvm/svm/avic.h > new file mode 100644 > index 0000000..16b433c > --- /dev/null > +++ b/xen/include/asm-x86/hvm/svm/avic.h > @@ -0,0 +1,40 @@ > +#ifndef _SVM_AVIC_H_ > +#define _SVM_AVIC_H_ > + > +enum avic_incmp_ipi_err_code { > + AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE, > + AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN, > + AVIC_INCMP_IPI_ERR_INV_TARGET, > + AVIC_INCMP_IPI_ERR_INV_BK_PAGE, > +}; > + > +struct __attribute__ ((__packed__)) Please include xen/compiler.h and use __packed > +avic_log_apic_id_ent { > + u32 guest_phy_apic_id : 8; > + u32 res : 23; > + u32 valid : 1; > +}; > + > +struct __attribute__ ((__packed__)) > +avic_phy_apic_id_ent { > + u64 host_phy_apic_id : 8; > + u64 res1 : 4; > + u64 bk_pg_ptr : 40; This field should have mfn in its name, as it has an implicit shift in its definition. > + u64 res2 : 10; > + u64 is_running : 1; > + u64 valid : 1; > +}; > <snip> > diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h > index 43cb98e..d3d045f 100644 > --- a/xen/include/asm-x86/hvm/svm/vmcb.h > +++ b/xen/include/asm-x86/hvm/svm/vmcb.h > @@ -496,6 +496,24 @@ struct __packed vmcb_struct { > }; > > struct svm_domain { > + /* > + * This per-domain table is used by the hardware to locate > + * the vAPIC backing page to be used to deliver interrupts > + * based on the guest physical APIC ID. > + */ > + paddr_t avic_phy_apic_id_table_mfn; paddrs and mfns differ by PAGE_SHIFT. Please use mfn_t here. > + > + /* > + * This per-domain table is used by the hardware to map > + * logically addressed interrupt requests (w/ guest logical APIC id) > + * to the guest physical APIC ID. > + */ > + paddr_t avic_log_apic_id_table_mfn; > + > + u32 avic_max_vcpu_id; > + bool avic_access_page_done; > + spinlock_t avic_ldr_mode_lock; You introduce the spinlock here, but initialise it in the subsequent patch. I suspect that hunk wants moving into this patch? ~Andrew > + u32 avic_ldr_mode; > }; > > struct arch_svm_struct {
> + > +#define AVIC_DOORBELL 0xc001011b MSR_AVIC_DOORBELL (and it should probably go to include/asm-x86/msr-index.h) > + > +#define AVIC_HPA_SHIFT 12 Is there a reason not to use regular PAGE_SHIFT? > +#define AVIC_HPA_MASK (((1ULL << 40) - 1) << AVIC_HPA_SHIFT) > +#define AVIC_VAPIC_BAR_MASK AVIC_HPA_MASK > + > +/* > + * Note: > + * Currently, svm-avic mode is not supported with nested virtualization. > + * Therefore, it is not yet currently enabled by default. Once the support > + * is in-place, this should be enabled by default. > + */ > +bool svm_avic = 0; > +boolean_param("svm-avic", svm_avic); > + > +static struct avic_phy_apic_id_ent * > +avic_get_phy_apic_id_ent(const struct vcpu *v, unsigned int index) Something like phys_apicid might be more descriptive. > +{ > + struct avic_phy_apic_id_ent *avic_phy_apic_id_table; > + struct svm_domain *d = &v->domain->arch.hvm_domain.svm; > + > + if ( !d->avic_phy_apic_id_table_mfn ) > + return NULL; > + > + /* > + * Note: APIC ID = 0xff is used for broadcast. > + * APIC ID > 0xff is reserved. > + */ > + if ( index >= 0xff ) > + return NULL; > + > + avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn); > + > + return &avic_phy_apic_id_table[index]; > +} > +} > + > +static inline u32 * > +avic_get_bk_page_entry(const struct vcpu *v, u32 offset) > +{ > + const struct vlapic *vlapic = vcpu_vlapic(v); > + char *tmp; > + > + if ( !vlapic || !vlapic->regs_page ) Should this be changed to an ASSERT? I think this is only called from places that already tested this or from places that you wouldn't get to unless these both were true (like VMEXIT handler). -boris
Hi Andrew On 1/2/17 23:37, Andrew Cooper wrote: > On 31/12/2016 05:45, Suravee Suthikulpanit wrote: >> [...] >> +/* >> + * Note: Current max index allowed for physical APIC ID table is 255. >> + */ >> +#define AVIC_PHY_APIC_ID_MAX 0xFF >> + >> +#define AVIC_DOORBELL 0xc001011b >> + >> +#define AVIC_HPA_SHIFT 12 >> +#define AVIC_HPA_MASK (((1ULL << 40) - 1) << AVIC_HPA_SHIFT) > > What does this mask represent? I have spotted a truncation bug below, > but how to fix the issue depends on the meaning of the mask. > > The only limits I can see in the spec is that the host physical > addresses must be present in system RAM, which presumably means they > should follow maxphysaddr like everything else? Please see below. >> [...] >> +/* >> + * Note: >> + * Currently, svm-avic mode is not supported with nested virtualization. >> + * Therefore, it is not yet currently enabled by default. Once the support >> + * is in-place, this should be enabled by default. >> + */ >> +bool svm_avic = 0; >> +boolean_param("svm-avic", svm_avic); > > We are trying to avoid the proliferation of top-level options. Please > could you introduce a "svm=" option which takes avic as a sub-boolean, > similar to how iommu= currently works? Sure, I will introduce the custom_param("svm", parse_svm_param). >> [...] >> +{ >> + struct avic_phy_apic_id_ent *avic_phy_apic_id_table; >> + struct svm_domain *d = &v->domain->arch.hvm_domain.svm; >> + >> + if ( !d->avic_phy_apic_id_table_mfn ) >> + return NULL; >> + >> + /* >> + * Note: APIC ID = 0xff is used for broadcast. >> + * APIC ID > 0xff is reserved. >> + */ >> + if ( index >= 0xff ) >> + return NULL; >> + >> + avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn); > > This is unsafe and will break on hosts with more than 5TB of RAM. > > As you allocate avic_phy_apic_id_table_mfn with alloc_domheap_page(), > you must use {map,unmap}_domain_page() to get temporary mappings (which > will use mfn_to_virt() in the common case), or use > {map,unmap}_domain_page_global() to get permanent mappings. > > As the values in here need modifying across a schedule, I would suggest > making a global mapping at create time, and storing the result in a > properly-typed pointer. Thanks for pointing this out. I'll update both logical and physical APIC ID table to use the {map,unmap}_domain_page_global() instead. >> [...] >> + >> + if ( !svm_avic ) >> + return 0; > > || !has_vlapic(d) > > HVMLite domains may legitimately be configured without an LAPIC at all. > Hm... That means I would also need to check this in svm_avic_dom_destroy() and svm_avic_init_vmcb(). >> [...] >> + >> +static inline u32 * >> +avic_get_bk_page_entry(const struct vcpu *v, u32 offset) >> +{ >> + const struct vlapic *vlapic = vcpu_vlapic(v); >> + char *tmp; >> + >> + if ( !vlapic || !vlapic->regs_page ) >> + return NULL; >> + >> + tmp = (char *)page_to_virt(vlapic->regs_page); >> + return (u32 *)(tmp + offset); > > This is also unsafe over 5TB. vlapic->regs is already a global mapping > which you should use. Good point. > > Also, you should leave a comment by the allocation of regs_page that > AVIC depends on it being a full page allocation. Ok, I will add a comment in arch/x86/hvm/vlapic.c: vlapic_init(). >> +} >> + >> +void svm_avic_update_vapic_bar(const struct vcpu *v, uint64_t data) >> +{ >> + const struct arch_svm_struct *s = &v->arch.hvm_svm; >> + >> + s->vmcb->avic_vapic_bar = data & AVIC_VAPIC_BAR_MASK; >> + s->vmcb->cleanbits.fields.avic = 0; > > While I can appreciate what you are trying to do here, altering the > physical address in IA32_APICBASE has never functioned properly under > Xen, as nothing updates the P2M. Worse, with multi-vcpu guests, the use > of a single shared p2m makes this impossible to do properly. Ahh.. Good to know > I know it is non-architectural behaviour, but IMO vlapic_msr_set() > should be updated to raise #GP[0] when attempting to change the frame, > to make it fail obviously rather than having interrupts go missing. > > Also I see that the Intel side (via vmx_vlapic_msr_changed()) makes the > same mistake. Okay, it seems that the hvm_msr_write_intercept() has already handled this isn't it? So, I should be able to pretty much removing the handling for the MSR here. >> +} >> + >> +int svm_avic_init_vmcb(struct vcpu *v) >> +{ >> + paddr_t ma; >> + u32 *apic_id_reg; >> + struct arch_svm_struct *s = &v->arch.hvm_svm; >> + struct vmcb_struct *vmcb = s->vmcb; >> + struct svm_domain *d = &v->domain->arch.hvm_domain.svm; >> + const struct vlapic *vlapic = vcpu_vlapic(v); >> + struct avic_phy_apic_id_ent entry; >> + >> + if ( !svm_avic ) >> + return 0; >> + >> + if ( !vlapic || !vlapic->regs_page ) >> + return -EINVAL; > > Somewhere you need a bounds check against the APIC ID being within AVIC > limits. We are checking the bound in the avic_get_phy_apic_id_ent(). > >> + if ( !s->avic_last_phy_id ) >> + return -EINVAL; > > Why does a pointer into the physical ID page need storing in > arch_svm_struct? > This was so that we can quickly access the physical APIC ID entry to update them w/o having to call avic_get_phy_apic_id_entry(). >> + >> + vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & AVIC_HPA_MASK; > > This use of AVIC_HPA_MASK may truncate the the address, which is > definitely a problem. I'm not quite sure that I got the truncation that you pointed out. Could you please elaborate? > If AVIC_HPA_MASK isn't just related to maxphysaddr, then you need to > pass appropriate memflags into the alloc_domheap_page() calls to get a > suitable page. > If by "maxphysaddr" is the 52-bit physical address limit for the PAE mode, then I think that's related. >> + >> + entry = *(s->avic_last_phy_id); >> + smp_rmb(); >> + entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> AVIC_HPA_SHIFT; >> + entry.is_running = 0; >> + entry.valid = 1; >> + *(s->avic_last_phy_id) = entry; >> + smp_wmb(); > > During domain creation, no guests are running, so you can edit this cleanly. > > What values are actually being excluded here? This, and other patches, > look like you are actually just trying to do a read_atomic(), modify, > write_atomic() update, rather than actually requiring ordering. > Besides the read_atomic(), modify write_atomic() to update the entry. I also want to make sure that the compiler won't shuffle the order around, which I thought can be achieved via smp_rmb() and smp_wmb(). >> [...] >> @@ -1799,6 +1802,10 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content) >> >> switch ( msr ) >> { >> + case MSR_IA32_APICBASE: >> + if ( svm_avic_vcpu_enabled(v) ) >> + svm_avic_update_vapic_bar(v, msr_content); >> + break; > > I think this is dead code. MSR_IA32_APICBASE is handled in > hvm_msr_write_intercept(), and won't fall into the vendor hook. If you > were to continue down this route, I would add another pi_ops hook, and > replace the VMX layering violation in vlapic_msr_set(). > I see now. I will remove the code added here. Thanks, Suravee
On 04/01/17 17:24, Suravee Suthikulpanit wrote: > On 1/2/17 23:37, Andrew Cooper wrote: >>> + >>> + vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & >>> AVIC_HPA_MASK; >> >> This use of AVIC_HPA_MASK may truncate the the address, which is >> definitely a problem. > > I'm not quite sure that I got the truncation that you pointed out. > Could you please elaborate? My apologies. I hadn't considered the PAGE_SHIFT in the definition, and had come to the conclusion you were chopping the physical address off at 2^40. > >> If AVIC_HPA_MASK isn't just related to maxphysaddr, then you need to >> pass appropriate memflags into the alloc_domheap_page() calls to get a >> suitable page. >> > > If by "maxphysaddr" is the 52-bit physical address limit for the PAE > mode, then I think that's related. You can safely rely on page_to_maddr() giving you a sensible value without further masking. By having a struct_page in the first place, it is a known good frame. > >>> + >>> + entry = *(s->avic_last_phy_id); >>> + smp_rmb(); >>> + entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> >>> AVIC_HPA_SHIFT; >>> + entry.is_running = 0; >>> + entry.valid = 1; >>> + *(s->avic_last_phy_id) = entry; >>> + smp_wmb(); >> >> During domain creation, no guests are running, so you can edit this >> cleanly. > > >> What values are actually being excluded here? This, and other patches, >> look like you are actually just trying to do a read_atomic(), modify, >> write_atomic() update, rather than actually requiring ordering. >> > > Besides the read_atomic(), modify write_atomic() to update the entry. > I also want to make sure that the compiler won't shuffle the order > around, which I thought can be achieved via smp_rmb() and smp_wmb(). Shuffle which order? Your smp_rmb() is between two writes, and the smp_wmb() isn't paired with anything. I think using read_atomic() and write_atomic() will DTRT for you; All you appear to need is a guarantee that the code won't read/update the live table using multiple accesses. ~Andrew
Hi Andrew, On 1/2/17 23:37, Andrew Cooper wrote: >> + ma = d->avic_phy_apic_id_table_mfn; >> + vmcb->avic_phy_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK; >> + vmcb->avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX; > Surely this should be some calculation derived from d->max_vcpus? This is actually should be the value of highest physical APIC ID, since the documentations mentioned that this is the highest index into the physical APIC ID table. I am checking with the HW folks to try to understand if there are any negative effects if we were to always set this to 0xFF. However, I have also looking into setting/updating this in the following scenarios: 1. During svm_avic_init_vmcb(): We should be able to just have a for_each_vcpu() loop to iterate through all VMCBs and updating this field with the max physical APIC ID. 2. During avic_unaccel_trap_write()->avic_handle_apic_id_update(): This is more tricky since we would need to trap all VCPUs to update this field in all VMCBs to have the same value. What mechanism would you suggest to achieve this? Thanks, Suravee
diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 0138978..1139099 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1457,6 +1457,15 @@ enforces the maximum theoretically necessary timeout of 670ms. Any number is being interpreted as a custom timeout in milliseconds. Zero or boolean false disable the quirk workaround, which is also the default. +### svm\_avic +> `= <boolean>` + +> Default: `false` + +This option enables Advance Virtual Interrupt Controller (AVIC), +which is an extension of AMD Secure Virtual Machine (SVM) to vitualize +local APIC for guest VM. + ### sync\_console > `= <boolean>` diff --git a/xen/arch/x86/hvm/svm/Makefile b/xen/arch/x86/hvm/svm/Makefile index 760d295..e0e4a59 100644 --- a/xen/arch/x86/hvm/svm/Makefile +++ b/xen/arch/x86/hvm/svm/Makefile @@ -1,4 +1,5 @@ obj-y += asid.o +obj-y += avic.o obj-y += emulate.o obj-bin-y += entry.o obj-y += intr.o diff --git a/xen/arch/x86/hvm/svm/avic.c b/xen/arch/x86/hvm/svm/avic.c new file mode 100644 index 0000000..b62f982 --- /dev/null +++ b/xen/arch/x86/hvm/svm/avic.c @@ -0,0 +1,232 @@ +/* + * avic.c: implements AMD Advance Virtual Interrupt Controller (AVIC) support + * Copyright (c) 2016, Advanced Micro Devices, Inc. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + * You should have received a copy of the GNU General Public License along with + * this program; If not, see <http://www.gnu.org/licenses/>. + */ + +#include <xen/domain_page.h> +#include <xen/sched.h> +#include <xen/stdbool.h> +#include <asm/acpi.h> +#include <asm/apicdef.h> +#include <asm/event.h> +#include <asm/hvm/emulate.h> +#include <asm/hvm/nestedhvm.h> +#include <asm/hvm/support.h> +#include <asm/hvm/svm/avic.h> +#include <asm/hvm/vlapic.h> +#include <asm/p2m.h> +#include <asm/page.h> + +/* + * Note: Current max index allowed for physical APIC ID table is 255. + */ +#define AVIC_PHY_APIC_ID_MAX 0xFF + +#define AVIC_DOORBELL 0xc001011b + +#define AVIC_HPA_SHIFT 12 +#define AVIC_HPA_MASK (((1ULL << 40) - 1) << AVIC_HPA_SHIFT) +#define AVIC_VAPIC_BAR_MASK AVIC_HPA_MASK + +/* + * Note: + * Currently, svm-avic mode is not supported with nested virtualization. + * Therefore, it is not yet currently enabled by default. Once the support + * is in-place, this should be enabled by default. + */ +bool svm_avic = 0; +boolean_param("svm-avic", svm_avic); + +static struct avic_phy_apic_id_ent * +avic_get_phy_apic_id_ent(const struct vcpu *v, unsigned int index) +{ + struct avic_phy_apic_id_ent *avic_phy_apic_id_table; + struct svm_domain *d = &v->domain->arch.hvm_domain.svm; + + if ( !d->avic_phy_apic_id_table_mfn ) + return NULL; + + /* + * Note: APIC ID = 0xff is used for broadcast. + * APIC ID > 0xff is reserved. + */ + if ( index >= 0xff ) + return NULL; + + avic_phy_apic_id_table = mfn_to_virt(d->avic_phy_apic_id_table_mfn); + + return &avic_phy_apic_id_table[index]; +} + +int svm_avic_dom_init(struct domain *d) +{ + int ret = 0; + struct page_info *pg; + unsigned long mfn; + + if ( !svm_avic ) + return 0; + + /* + * Note: + * AVIC hardware walks the nested page table to check permissions, + * but does not use the SPA address specified in the leaf page + * table entry since it uses address in the AVIC_BACKING_PAGE pointer + * field of the VMCB. Therefore, we set up a dummy page for APIC _mfn(0). + */ + if ( !d->arch.hvm_domain.svm.avic_access_page_done ) + { + set_mmio_p2m_entry(d, paddr_to_pfn(APIC_DEFAULT_PHYS_BASE), + _mfn(0), PAGE_ORDER_4K, + p2m_get_hostp2m(d)->default_access); + d->arch.hvm_domain.svm.avic_access_page_done = true; + } + + /* Init AVIC logical APIC ID table */ + pg = alloc_domheap_page(d, MEMF_no_owner); + if ( !pg ) + { + gdprintk(XENLOG_ERR, + "%d: AVIC logical APIC ID table could not be allocated.\n", + d->domain_id); + ret = -ENOMEM; + goto err_out; + } + mfn = page_to_mfn(pg); + clear_domain_page(_mfn(mfn)); + d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn = mfn; + + /* Init AVIC physical APIC ID table */ + pg = alloc_domheap_page(d, MEMF_no_owner); + if ( !pg ) + { + gdprintk(XENLOG_ERR, + "%d: AVIC physical APIC ID table could not be allocated.\n", + d->domain_id); + ret = -ENOMEM; + goto err_out; + } + mfn = page_to_mfn(pg); + clear_domain_page(_mfn(mfn)); + d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn = mfn; + + return ret; + err_out: + svm_avic_dom_destroy(d); + return ret; +} + +void svm_avic_dom_destroy(struct domain *d) +{ + if ( !svm_avic ) + return; + + if ( d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn ) + { + free_domheap_page(mfn_to_page(d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn)); + d->arch.hvm_domain.svm.avic_phy_apic_id_table_mfn = 0; + } + + if ( d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn ) + { + free_domheap_page(mfn_to_page(d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn)); + d->arch.hvm_domain.svm.avic_log_apic_id_table_mfn = 0; + } +} + +bool svm_avic_vcpu_enabled(const struct vcpu *v) +{ + const struct arch_svm_struct *s = &v->arch.hvm_svm; + const struct vmcb_struct *vmcb = s->vmcb; + + return svm_avic && vmcb->_vintr.fields.avic_enable; +} + +static inline u32 * +avic_get_bk_page_entry(const struct vcpu *v, u32 offset) +{ + const struct vlapic *vlapic = vcpu_vlapic(v); + char *tmp; + + if ( !vlapic || !vlapic->regs_page ) + return NULL; + + tmp = (char *)page_to_virt(vlapic->regs_page); + return (u32 *)(tmp + offset); +} + +void svm_avic_update_vapic_bar(const struct vcpu *v, uint64_t data) +{ + const struct arch_svm_struct *s = &v->arch.hvm_svm; + + s->vmcb->avic_vapic_bar = data & AVIC_VAPIC_BAR_MASK; + s->vmcb->cleanbits.fields.avic = 0; +} + +int svm_avic_init_vmcb(struct vcpu *v) +{ + paddr_t ma; + u32 *apic_id_reg; + struct arch_svm_struct *s = &v->arch.hvm_svm; + struct vmcb_struct *vmcb = s->vmcb; + struct svm_domain *d = &v->domain->arch.hvm_domain.svm; + const struct vlapic *vlapic = vcpu_vlapic(v); + struct avic_phy_apic_id_ent entry; + + if ( !svm_avic ) + return 0; + + if ( !vlapic || !vlapic->regs_page ) + return -EINVAL; + + apic_id_reg = avic_get_bk_page_entry(v, APIC_ID); + if ( !apic_id_reg ) + return -EINVAL; + + s->avic_last_phy_id = avic_get_phy_apic_id_ent(v, *apic_id_reg >> 24); + if ( !s->avic_last_phy_id ) + return -EINVAL; + + vmcb->avic_bk_pg_pa = page_to_maddr(vlapic->regs_page) & AVIC_HPA_MASK; + ma = d->avic_log_apic_id_table_mfn; + vmcb->avic_log_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK; + ma = d->avic_phy_apic_id_table_mfn; + vmcb->avic_phy_apic_id = (ma << PAGE_SHIFT) & AVIC_HPA_MASK; + vmcb->avic_phy_apic_id |= AVIC_PHY_APIC_ID_MAX; + + entry = *(s->avic_last_phy_id); + smp_rmb(); + entry.bk_pg_ptr = (vmcb->avic_bk_pg_pa & AVIC_HPA_MASK) >> AVIC_HPA_SHIFT; + entry.is_running = 0; + entry.valid = 1; + *(s->avic_last_phy_id) = entry; + smp_wmb(); + + svm_avic_update_vapic_bar(v, APIC_DEFAULT_PHYS_BASE); + + vmcb->_vintr.fields.avic_enable = 1; + + return 0; +} + +/* + * Local variables: + * mode: C + * c-file-style: "BSD" + * c-basic-offset: 4 + * tab-width: 4 + * indent-tabs-mode: nil + * End: + */ diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index 37bd6c4..cef29b2 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -48,6 +48,7 @@ #include <asm/hvm/svm/asid.h> #include <asm/hvm/svm/svm.h> #include <asm/hvm/svm/vmcb.h> +#include <asm/hvm/svm/avic.h> #include <asm/hvm/svm/emulate.h> #include <asm/hvm/svm/intr.h> #include <asm/hvm/svm/svmdebug.h> @@ -1162,11 +1163,12 @@ void svm_host_osvw_init() static int svm_domain_initialise(struct domain *d) { - return 0; + return svm_avic_dom_init(d); } static void svm_domain_destroy(struct domain *d) { + svm_avic_dom_destroy(d); } static int svm_vcpu_initialise(struct vcpu *v) @@ -1459,6 +1461,7 @@ const struct hvm_function_table * __init start_svm(void) P(cpu_has_svm_decode, "DecodeAssists"); P(cpu_has_pause_filter, "Pause-Intercept Filter"); P(cpu_has_tsc_ratio, "TSC Rate MSR"); + P(cpu_has_svm_avic, "AVIC"); #undef P if ( !printed ) @@ -1799,6 +1802,10 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content) switch ( msr ) { + case MSR_IA32_APICBASE: + if ( svm_avic_vcpu_enabled(v) ) + svm_avic_update_vapic_bar(v, msr_content); + break; case MSR_IA32_SYSENTER_CS: case MSR_IA32_SYSENTER_ESP: case MSR_IA32_SYSENTER_EIP: diff --git a/xen/arch/x86/hvm/svm/vmcb.c b/xen/arch/x86/hvm/svm/vmcb.c index 9ea014f..9ee7fc7 100644 --- a/xen/arch/x86/hvm/svm/vmcb.c +++ b/xen/arch/x86/hvm/svm/vmcb.c @@ -28,6 +28,7 @@ #include <asm/msr-index.h> #include <asm/p2m.h> #include <asm/hvm/support.h> +#include <asm/hvm/svm/avic.h> #include <asm/hvm/svm/svm.h> #include <asm/hvm/svm/svmdebug.h> @@ -225,6 +226,8 @@ static int construct_vmcb(struct vcpu *v) vmcb->_general1_intercepts |= GENERAL1_INTERCEPT_PAUSE; } + svm_avic_init_vmcb(v); + vmcb->cleanbits.bytes = 0; return 0; diff --git a/xen/include/asm-x86/hvm/svm/avic.h b/xen/include/asm-x86/hvm/svm/avic.h new file mode 100644 index 0000000..16b433c --- /dev/null +++ b/xen/include/asm-x86/hvm/svm/avic.h @@ -0,0 +1,40 @@ +#ifndef _SVM_AVIC_H_ +#define _SVM_AVIC_H_ + +enum avic_incmp_ipi_err_code { + AVIC_INCMP_IPI_ERR_INVALID_INT_TYPE, + AVIC_INCMP_IPI_ERR_TARGET_NOT_RUN, + AVIC_INCMP_IPI_ERR_INV_TARGET, + AVIC_INCMP_IPI_ERR_INV_BK_PAGE, +}; + +struct __attribute__ ((__packed__)) +avic_log_apic_id_ent { + u32 guest_phy_apic_id : 8; + u32 res : 23; + u32 valid : 1; +}; + +struct __attribute__ ((__packed__)) +avic_phy_apic_id_ent { + u64 host_phy_apic_id : 8; + u64 res1 : 4; + u64 bk_pg_ptr : 40; + u64 res2 : 10; + u64 is_running : 1; + u64 valid : 1; +}; + +extern bool_t svm_avic; + +int svm_avic_dom_init(struct domain *d); +void svm_avic_dom_destroy(struct domain *d); + +int svm_avic_init_vcpu(struct vcpu *v); +void svm_avic_destroy_vcpu(struct vcpu *v); +bool_t svm_avic_vcpu_enabled(const struct vcpu *v); + +void svm_avic_update_vapic_bar(const struct vcpu *v,uint64_t data); +int svm_avic_init_vmcb(struct vcpu *v); + +#endif /* _SVM_AVIC_H_ */ diff --git a/xen/include/asm-x86/hvm/svm/svm.h b/xen/include/asm-x86/hvm/svm/svm.h index c954b7e..fea61bb 100644 --- a/xen/include/asm-x86/hvm/svm/svm.h +++ b/xen/include/asm-x86/hvm/svm/svm.h @@ -81,6 +81,7 @@ extern u32 svm_feature_flags; #define SVM_FEATURE_FLUSHBYASID 6 /* TLB flush by ASID support */ #define SVM_FEATURE_DECODEASSISTS 7 /* Decode assists support */ #define SVM_FEATURE_PAUSEFILTER 10 /* Pause intercept filter support */ +#define SVM_FEATURE_AVIC 13 /* AVIC support */ #define cpu_has_svm_feature(f) test_bit(f, &svm_feature_flags) #define cpu_has_svm_npt cpu_has_svm_feature(SVM_FEATURE_NPT) @@ -91,6 +92,7 @@ extern u32 svm_feature_flags; #define cpu_has_svm_decode cpu_has_svm_feature(SVM_FEATURE_DECODEASSISTS) #define cpu_has_pause_filter cpu_has_svm_feature(SVM_FEATURE_PAUSEFILTER) #define cpu_has_tsc_ratio cpu_has_svm_feature(SVM_FEATURE_TSCRATEMSR) +#define cpu_has_svm_avic cpu_has_svm_feature(SVM_FEATURE_AVIC) #define SVM_PAUSEFILTER_INIT 3000 diff --git a/xen/include/asm-x86/hvm/svm/vmcb.h b/xen/include/asm-x86/hvm/svm/vmcb.h index 43cb98e..d3d045f 100644 --- a/xen/include/asm-x86/hvm/svm/vmcb.h +++ b/xen/include/asm-x86/hvm/svm/vmcb.h @@ -496,6 +496,24 @@ struct __packed vmcb_struct { }; struct svm_domain { + /* + * This per-domain table is used by the hardware to locate + * the vAPIC backing page to be used to deliver interrupts + * based on the guest physical APIC ID. + */ + paddr_t avic_phy_apic_id_table_mfn; + + /* + * This per-domain table is used by the hardware to map + * logically addressed interrupt requests (w/ guest logical APIC id) + * to the guest physical APIC ID. + */ + paddr_t avic_log_apic_id_table_mfn; + + u32 avic_max_vcpu_id; + bool avic_access_page_done; + spinlock_t avic_ldr_mode_lock; + u32 avic_ldr_mode; }; struct arch_svm_struct { @@ -529,6 +547,9 @@ struct arch_svm_struct { u64 length; u64 status; } osvw; + + struct avic_phy_apic_id_ent *avic_last_phy_id; + u32 avic_last_ldr; }; struct vmcb_struct *alloc_vmcb(void);
Introduce AVIC base initialization code. This includes: * Setting up per-VM data structures. * Setting up per-vCPU data structure. * Initializing AVIC-related VMCB bit fields. This patch also introduces a new Xen parameter (svm-avic), which can be used to enable/disable AVIC support. Currently, this svm-avic is disabled by default since it does not supported nested virtualization yet. Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com> Cc: Jan Beulich <JBeulich@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> --- docs/misc/xen-command-line.markdown | 9 ++ xen/arch/x86/hvm/svm/Makefile | 1 + xen/arch/x86/hvm/svm/avic.c | 232 ++++++++++++++++++++++++++++++++++++ xen/arch/x86/hvm/svm/svm.c | 9 +- xen/arch/x86/hvm/svm/vmcb.c | 3 + xen/include/asm-x86/hvm/svm/avic.h | 40 +++++++ xen/include/asm-x86/hvm/svm/svm.h | 2 + xen/include/asm-x86/hvm/svm/vmcb.h | 21 ++++ 8 files changed, 316 insertions(+), 1 deletion(-) create mode 100644 xen/arch/x86/hvm/svm/avic.c create mode 100644 xen/include/asm-x86/hvm/svm/avic.h