From patchwork Fri Feb 5 13:42:08 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrew Cooper X-Patchwork-Id: 8234751 Return-Path: X-Original-To: patchwork-xen-devel@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork2.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork2.web.kernel.org (Postfix) with ESMTP id B7DC3BEEE5 for ; Fri, 5 Feb 2016 14:04:37 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 6B453203A9 for ; Fri, 5 Feb 2016 14:04:36 +0000 (UTC) Received: from lists.xen.org (lists.xenproject.org [50.57.142.19]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D6C1C2039D for ; Fri, 5 Feb 2016 14:04:34 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xen.org) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aRgxa-0000B4-FN; Fri, 05 Feb 2016 14:02:06 +0000 Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1aRgxY-00008i-3Z for xen-devel@lists.xen.org; Fri, 05 Feb 2016 14:02:04 +0000 Received: from [85.158.139.211] by server-17.bemta-5.messagelabs.com id CE/9B-21901-B5BA4B65; Fri, 05 Feb 2016 14:02:03 +0000 X-Env-Sender: prvs=8364524b4=Andrew.Cooper3@citrix.com X-Msg-Ref: server-2.tower-206.messagelabs.com!1454680920!20604588!2 X-Originating-IP: [66.165.176.63] X-SpamReason: No, hits=0.0 required=7.0 tests=sa_preprocessor: VHJ1c3RlZCBJUDogNjYuMTY1LjE3Ni42MyA9PiAzMDYwNDg=\n, received_headers: No Received headers X-StarScan-Received: X-StarScan-Version: 7.35.1; banners=-,-,- X-VirusChecked: Checked Received: (qmail 4050 invoked from network); 5 Feb 2016 14:02:02 -0000 Received: from smtp02.citrix.com (HELO SMTP02.CITRIX.COM) (66.165.176.63) by server-2.tower-206.messagelabs.com with RC4-SHA encrypted SMTP; 5 Feb 2016 14:02:02 -0000 X-IronPort-AV: E=Sophos;i="5.22,400,1449532800"; d="scan'208";a="336333859" From: Andrew Cooper To: Xen-devel Date: Fri, 5 Feb 2016 13:42:08 +0000 Message-ID: <1454679743-18133-16-git-send-email-andrew.cooper3@citrix.com> X-Mailer: git-send-email 2.1.4 In-Reply-To: <1454679743-18133-1-git-send-email-andrew.cooper3@citrix.com> References: <1454679743-18133-1-git-send-email-andrew.cooper3@citrix.com> MIME-Version: 1.0 X-DLP: MIA2 Cc: Andrew Cooper , Jan Beulich Subject: [Xen-devel] [PATCH v2 15/30] xen/x86: Improvements to in-hypervisor cpuid sanity checks X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.13 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org X-Spam-Status: No, score=-4.2 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_MED, UNPARSEABLE_RELAY autolearn=unavailable version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP * Use the boot-generated pv and hvm featureset to clamp the visible features, rather than picking and choosing at individual features. This subsumes the static feature manipulation. * More use of compiler-visible &'s and |'s, rather than clear,set bit. * Remove logic which hides PSE36 out of PAE mode. This is not how real hardware behaves. * Improve logic to set OSXSAVE. The bit is cleared by virtue of not being valid in a featureset, and should be a strict fast-forward from %cr4. Provide a very big health warning for OXSAVE for PV guests, which is non-architectural. Signed-off-by: Andrew Cooper --- CC: Jan Beulich v2: * Reinstate some of the dynamic checks for now. Future development work will instate a complete per-domain policy. * Fix OSXSAVE handling for PV guests. --- xen/arch/x86/hvm/hvm.c | 56 +++++++++--------- xen/arch/x86/traps.c | 151 ++++++++++++++++++++++++------------------------- 2 files changed, 100 insertions(+), 107 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 35ec6c9..03b3868 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -71,6 +71,7 @@ #include #include #include +#include bool_t __read_mostly hvm_enabled; @@ -4617,50 +4618,39 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, /* Fix up VLAPIC details. */ *ebx &= 0x00FFFFFFu; *ebx |= (v->vcpu_id * 2) << 24; + + *ecx &= hvm_featureset[FEATURESET_1c]; + *edx &= hvm_featureset[FEATURESET_1d]; + if ( vlapic_hw_disabled(vcpu_vlapic(v)) ) - __clear_bit(X86_FEATURE_APIC & 31, edx); + *edx &= ~cpufeat_bit(X86_FEATURE_APIC); /* Fix up OSXSAVE. */ - if ( cpu_has_xsave ) - *ecx |= (v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE) ? - cpufeat_mask(X86_FEATURE_OSXSAVE) : 0; + if ( v->arch.hvm_vcpu.guest_cr[4] & X86_CR4_OSXSAVE ) + *ecx |= cpufeat_mask(X86_FEATURE_OSXSAVE); /* Don't expose PCID to non-hap hvm. */ if ( !hap_enabled(d) ) *ecx &= ~cpufeat_mask(X86_FEATURE_PCID); - - /* Only provide PSE36 when guest runs in 32bit PAE or in long mode */ - if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) ) - *edx &= ~cpufeat_mask(X86_FEATURE_PSE36); break; + case 0x7: if ( count == 0 ) { - if ( !cpu_has_smep ) - *ebx &= ~cpufeat_mask(X86_FEATURE_SMEP); - - if ( !cpu_has_smap ) - *ebx &= ~cpufeat_mask(X86_FEATURE_SMAP); - - /* Don't expose MPX to hvm when VMX support is not available */ - if ( !(vmx_vmexit_control & VM_EXIT_CLEAR_BNDCFGS) || - !(vmx_vmentry_control & VM_ENTRY_LOAD_BNDCFGS) ) - *ebx &= ~cpufeat_mask(X86_FEATURE_MPX); + *ebx &= hvm_featureset[FEATURESET_7b0]; + *ecx &= hvm_featureset[FEATURESET_7c0]; /* Don't expose INVPCID to non-hap hvm. */ if ( !hap_enabled(d) ) *ebx &= ~cpufeat_mask(X86_FEATURE_INVPCID); - - /* Don't expose PCOMMIT to hvm when VMX support is not available */ - if ( !cpu_has_vmx_pcommit ) - *ebx &= ~cpufeat_mask(X86_FEATURE_PCOMMIT); } - break; + case 0xb: /* Fix the x2APIC identifier. */ *edx = v->vcpu_id * 2; break; + case 0xd: /* EBX value of main leaf 0 depends on enabled xsave features */ if ( count == 0 && v->arch.xcr0 ) @@ -4677,9 +4667,12 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, *ebx = _eax + _ebx; } } + if ( count == 1 ) { - if ( cpu_has_xsaves && cpu_has_vmx_xsaves ) + *eax &= hvm_featureset[FEATURESET_Da1]; + + if ( *eax & cpufeat_mask(X86_FEATURE_XSAVES) ) { *ebx = XSTATE_AREA_MIN_SIZE; if ( v->arch.xcr0 | v->arch.hvm_vcpu.msr_xss ) @@ -4694,6 +4687,9 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, break; case 0x80000001: + *ecx &= hvm_featureset[FEATURESET_e1c]; + *edx &= hvm_featureset[FEATURESET_e1d]; + /* We expose RDTSCP feature to guest only when tsc_mode == TSC_MODE_DEFAULT and host_tsc_is_safe() returns 1 */ if ( d->arch.tsc_mode != TSC_MODE_DEFAULT || @@ -4702,12 +4698,10 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, /* Hide 1GB-superpage feature if we can't emulate it. */ if (!hvm_pse1gb_supported(d)) *edx &= ~cpufeat_mask(X86_FEATURE_PAGE1GB); - /* Only provide PSE36 when guest runs in 32bit PAE or in long mode */ - if ( !(hvm_pae_enabled(v) || hvm_long_mode_enabled(v)) ) - *edx &= ~cpufeat_mask(X86_FEATURE_PSE36); - /* Hide data breakpoint extensions if the hardware has no support. */ - if ( !boot_cpu_has(X86_FEATURE_DBEXT) ) - *ecx &= ~cpufeat_mask(X86_FEATURE_DBEXT); + break; + + case 0x80000007: + *edx &= hvm_featureset[FEATURESET_e7d]; break; case 0x80000008: @@ -4725,6 +4719,8 @@ void hvm_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, hvm_cpuid(0x80000001, NULL, NULL, NULL, &_edx); *eax = (*eax & ~0xffff00) | (_edx & cpufeat_mask(X86_FEATURE_LM) ? 0x3000 : 0x2000); + + *ebx &= hvm_featureset[FEATURESET_e8b]; break; } } diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 6a181bb..d0f836c 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -73,6 +73,7 @@ #include #include #include +#include #include /* @@ -841,69 +842,70 @@ void pv_cpuid(struct cpu_user_regs *regs) else cpuid_count(leaf, subleaf, &a, &b, &c, &d); - if ( (leaf & 0x7fffffff) == 0x00000001 ) - { - /* Modify Feature Information. */ - if ( !cpu_has_apic ) - __clear_bit(X86_FEATURE_APIC, &d); - - if ( !is_pvh_domain(currd) ) - { - __clear_bit(X86_FEATURE_PSE, &d); - __clear_bit(X86_FEATURE_PGE, &d); - __clear_bit(X86_FEATURE_PSE36, &d); - __clear_bit(X86_FEATURE_VME, &d); - } - } - switch ( leaf ) { case 0x00000001: - /* Modify Feature Information. */ - if ( !cpu_has_sep ) - __clear_bit(X86_FEATURE_SEP, &d); - __clear_bit(X86_FEATURE_DS, &d); - __clear_bit(X86_FEATURE_ACC, &d); - __clear_bit(X86_FEATURE_PBE, &d); - if ( is_pvh_domain(currd) ) - __clear_bit(X86_FEATURE_MTRR, &d); - - __clear_bit(X86_FEATURE_DTES64 % 32, &c); - __clear_bit(X86_FEATURE_MWAIT % 32, &c); - __clear_bit(X86_FEATURE_DSCPL % 32, &c); - __clear_bit(X86_FEATURE_VMXE % 32, &c); - __clear_bit(X86_FEATURE_SMXE % 32, &c); - __clear_bit(X86_FEATURE_TM2 % 32, &c); + c &= pv_featureset[FEATURESET_1c]; + d &= pv_featureset[FEATURESET_1d]; + if ( is_pv_32bit_domain(currd) ) - __clear_bit(X86_FEATURE_CX16 % 32, &c); - __clear_bit(X86_FEATURE_XTPR % 32, &c); - __clear_bit(X86_FEATURE_PDCM % 32, &c); - __clear_bit(X86_FEATURE_PCID % 32, &c); - __clear_bit(X86_FEATURE_DCA % 32, &c); - if ( !cpu_has_xsave ) - { - __clear_bit(X86_FEATURE_XSAVE % 32, &c); - __clear_bit(X86_FEATURE_AVX % 32, &c); - } - if ( !cpu_has_apic ) - __clear_bit(X86_FEATURE_X2APIC % 32, &c); - __set_bit(X86_FEATURE_HYPERVISOR % 32, &c); + c &= ~cpufeat_mask(X86_FEATURE_CX16); + + /* + * !!! Warning - OSXSAVE handling for PV guests is non-architectural !!! + * + * Architecturally, the correct code here is simply: + * + * if ( curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE ) + * c |= cpufeat_mask(X86_FEATURE_OSXSAVE); + * + * However because of bugs in Xen (before c/s bd19080b, Nov 2010, the + * XSAVE cpuid flag leaked into guests despite the feature not being + * avilable for use), buggy workarounds where introduced to Linux (c/s + * 947ccf9c, also Nov 2010) which relied on the fact that Xen also + * incorrectly leaked OSXSAVE into the guest. + * + * Furthermore, providing architectural OSXSAVE behaviour to a many + * Linux PV guests triggered a further kernel bug when the fpu code + * observes that XSAVEOPT is available, assumes that xsave state had + * been set up for the task, and follows a wild pointer. + * + * Therefore, the leaking of Xen's OSXSAVE setting has become a + * defacto part of the PV ABI and can't reasonably be corrected. + * + * The following situations and logic now applies: + * + * - Hardware without CPUID faulting support and native CPUID: + * There is nothing Xen can do here. The hosts XSAVE flag will + * leak through and Xen's OSXSAVE choice will leak through. + * + * In the case that the guest kernel has not set up OSXSAVE, only + * SSE will be set in xcr0, and guest userspace can't do too much + * damage itself. + * + * - Enlightened CPUID or CPUID faulting available: + * Xen can fully control what is seen here. Guest kernels need to + * see the leaked OSXSAVE, but guest userspace is given + * architectural behaviour, to reflect the guest kernels + * intentions. + */ + if ( (is_pv_domain(currd) && guest_kernel_mode(curr, regs) && + (this_cpu(cr4) & X86_CR4_OSXSAVE)) || + (curr->arch.pv_vcpu.ctrlreg[4] & X86_CR4_OSXSAVE) ) + c |= cpufeat_mask(X86_FEATURE_OSXSAVE); + + c |= cpufeat_mask(X86_FEATURE_HYPERVISOR); break; case 0x00000007: if ( subleaf == 0 ) - b &= (cpufeat_mask(X86_FEATURE_BMI1) | - cpufeat_mask(X86_FEATURE_HLE) | - cpufeat_mask(X86_FEATURE_AVX2) | - cpufeat_mask(X86_FEATURE_BMI2) | - cpufeat_mask(X86_FEATURE_ERMS) | - cpufeat_mask(X86_FEATURE_RTM) | - cpufeat_mask(X86_FEATURE_RDSEED) | - cpufeat_mask(X86_FEATURE_ADX) | - cpufeat_mask(X86_FEATURE_FSGSBASE)); + { + b &= pv_featureset[FEATURESET_7b0]; + c &= pv_featureset[FEATURESET_7c0]; + } else - b = 0; - a = c = d = 0; + b = c = 0; + a = d = 0; break; case XSTATE_CPUID: @@ -926,37 +928,32 @@ void pv_cpuid(struct cpu_user_regs *regs) } case 1: - a &= (boot_cpu_data.x86_capability[cpufeat_word(X86_FEATURE_XSAVEOPT)] & - ~cpufeat_mask(X86_FEATURE_XSAVES)); + a &= pv_featureset[FEATURESET_Da1]; b = c = d = 0; break; } break; case 0x80000001: - /* Modify Feature Information. */ + c &= pv_featureset[FEATURESET_e1c]; + d &= pv_featureset[FEATURESET_e1d]; + if ( is_pv_32bit_domain(currd) ) { - __clear_bit(X86_FEATURE_LM % 32, &d); - __clear_bit(X86_FEATURE_LAHF_LM % 32, &c); - } - if ( is_pv_32bit_domain(currd) && - boot_cpu_data.x86_vendor != X86_VENDOR_AMD ) - __clear_bit(X86_FEATURE_SYSCALL % 32, &d); - __clear_bit(X86_FEATURE_PAGE1GB % 32, &d); - __clear_bit(X86_FEATURE_RDTSCP % 32, &d); - - __clear_bit(X86_FEATURE_SVM % 32, &c); - if ( !cpu_has_apic ) - __clear_bit(X86_FEATURE_EXTAPIC % 32, &c); - __clear_bit(X86_FEATURE_OSVW % 32, &c); - __clear_bit(X86_FEATURE_IBS % 32, &c); - __clear_bit(X86_FEATURE_SKINIT % 32, &c); - __clear_bit(X86_FEATURE_WDT % 32, &c); - __clear_bit(X86_FEATURE_LWP % 32, &c); - __clear_bit(X86_FEATURE_NODEID_MSR % 32, &c); - __clear_bit(X86_FEATURE_TOPOEXT % 32, &c); - __clear_bit(X86_FEATURE_MWAITX % 32, &c); + d &= ~cpufeat_mask(X86_FEATURE_LM); + c &= ~cpufeat_mask(X86_FEATURE_LAHF_LM); + + if ( boot_cpu_data.x86_vendor != X86_VENDOR_AMD ) + d &= ~cpufeat_mask(X86_FEATURE_SYSCALL); + } + break; + + case 0x80000007: + d &= pv_featureset[FEATURESET_e7d]; + break; + + case 0x80000008: + b &= pv_featureset[FEATURESET_e8b]; break; case 0x0000000a: /* Architectural Performance Monitor Features (Intel) */