Message ID | 20160208163526.GI28980@pd.tnic (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/02/16 16:35, Borislav Petkov wrote: > On Mon, Feb 08, 2016 at 11:31:04AM -0500, Boris Ostrovsky wrote: >> I think we are OK for PV because this code will be executed after pvops are >> set and so we will be calling xen_cpuid(). > Not for the early loader - it is too early for pvops then. So you're > saying something like that won't work? Correct. PV guests are ring-deprivilelged so the cpuid instruction doesn't trap in general. (It does on more modern Intel hardware with cpuid faulting enabled, but that is only IvyBridge and newer). Does the early loader have extable support? If so, this is fairly easy to fix. If not, we have a problem. ~Andrew
On 02/08/2016 11:35 AM, Borislav Petkov wrote: > On Mon, Feb 08, 2016 at 11:31:04AM -0500, Boris Ostrovsky wrote: >> I think we are OK for PV because this code will be executed after pvops are >> set and so we will be calling xen_cpuid(). > Not for the early loader - it is too early for pvops then. So you're > saying something like that won't work? Keep in mind that Xen PV doesn't go through startup_32|64(). It starts at xen_start_kernel (save for a small stub before that), which sets pvops. It "joins" regular/baremetal code in i386_start_kernel/x86_64_start_reservation(). -boris
On Mon, Feb 08, 2016 at 04:38:40PM +0000, Andrew Cooper wrote: > Does the early loader have extable support? If so, this is fairly easy > to fix. If not, we have a problem. It doesn't and regardless, you want to have this CPUID querying as simple as possible. No special handling, no special prefixes as it should be able to run on other hypervisors too. If one can't execute a simple CPUID(0x4...) on a xen guest and get the results back, then for early, we will have to do what we've done until now and simply emulate the MSR accesses. Later code can use then xen_cpuid() and all is fine. We should still get rid of paravirt_enabled() though.
On Mon, Feb 08, 2016 at 11:41:16AM -0500, Boris Ostrovsky wrote: > Keep in mind that Xen PV doesn't go through startup_32|64(). It starts at > xen_start_kernel (save for a small stub before that), which sets pvops. It > "joins" regular/baremetal code in > i386_start_kernel/x86_64_start_reservation(). Ah, so one more reason not to do the CPUID thing for the early loader. So it's up to you how to replace that paravirt_enabled() thing - CPUID or test static_cpu_has(X86_FEATURE_XENPV)...
On 02/08/2016 11:45 AM, Borislav Petkov wrote: > On Mon, Feb 08, 2016 at 04:38:40PM +0000, Andrew Cooper wrote: >> Does the early loader have extable support? If so, this is fairly easy >> to fix. If not, we have a problem. > It doesn't and regardless, you want to have this CPUID querying as > simple as possible. No special handling, no special prefixes as it > should be able to run on other hypervisors too. > > If one can't execute a simple CPUID(0x4...) on a xen guest and get the > results back, then for early, we will have to do what we've done until > now and simply emulate the MSR accesses. I think xen_hypervisor check can be done in microcode_init() as this is first time PV kernel deals with microcode. Let me try it --- I want to see what happens on hotplug and resume but I am reasonably certain this should work during boot. -boris > > Later code can use then xen_cpuid() and all is fine. We should still get > rid of paravirt_enabled() though. >
On 08/02/16 16:45, Borislav Petkov wrote: > On Mon, Feb 08, 2016 at 04:38:40PM +0000, Andrew Cooper wrote: >> Does the early loader have extable support? If so, this is fairly easy >> to fix. If not, we have a problem. > It doesn't and regardless, you want to have this CPUID querying as > simple as possible. No special handling, no special prefixes as it > should be able to run on other hypervisors too. > > If one can't execute a simple CPUID(0x4...) on a xen guest and get the > results back, then for early, we will have to do what we've done until > now and simply emulate the MSR accesses. > > Later code can use then xen_cpuid() and all is fine. We should still get > rid of paravirt_enabled() though. > The force emulation prefix starts with a ud2a instruction, so extable is to prevent it breaking on non-Xen systems. However, if extable isn't available, this point is moot. As an alternative check which should be doable this early on, peeking in the head of hypercall_page should work. If Linux was booted as a PV guest, the hypercall_page will have been constructed by the domain builder, and won't have 0x90's in it. ~Andrew
On Mon, Feb 08, 2016 at 04:53:00PM +0000, Andrew Cooper wrote: > As an alternative check which should be doable this early on, peeking in > the head of hypercall_page should work. If Linux was booted as a PV > guest, the hypercall_page will have been constructed by the domain > builder, and won't have 0x90's in it. Good to know, we might need it for something. :) Thanks.
On 02/08/2016 11:52 AM, Boris Ostrovsky wrote: > > > On 02/08/2016 11:45 AM, Borislav Petkov wrote: >> On Mon, Feb 08, 2016 at 04:38:40PM +0000, Andrew Cooper wrote: >>> Does the early loader have extable support? If so, this is fairly easy >>> to fix. If not, we have a problem. >> It doesn't and regardless, you want to have this CPUID querying as >> simple as possible. No special handling, no special prefixes as it >> should be able to run on other hypervisors too. >> >> If one can't execute a simple CPUID(0x4...) on a xen guest and get the >> results back, then for early, we will have to do what we've done until >> now and simply emulate the MSR accesses. > > I think xen_hypervisor check can be done in microcode_init() as this > is first time PV kernel deals with microcode. > > Let me try it --- I want to see what happens on hotplug and resume but > I am reasonably certain this should work during boot. So it looks like we can just simply revert a18a0f6850 because the very next patch to microcode code (fbae4ba8c4a) makes the original problem (of using __pa_nodebug, which we shouldn't use on PV) go away: we don't call load_ucode_ap from resume path anymore. -boris
On Mon, Feb 08, 2016 at 03:45:21PM -0500, Boris Ostrovsky wrote: > So it looks like we can just simply revert a18a0f6850 because the very next > patch to microcode code (fbae4ba8c4a) makes the original problem (of using > __pa_nodebug, which we shouldn't use on PV) go away: we don't call > load_ucode_ap from resume path anymore. Even better. Care to cut a proper patch explaining why we don't need the paravirt_enabled() check anymore, and prep it ontop of http://git.kernel.org/cgit/linux/kernel/git/bp/bp.git/log/?h=tip-microcode ? I'll queue it for 4.6. Thanks!
On Mon, Feb 08, 2016 at 04:53:00PM +0000, Andrew Cooper wrote: > On 08/02/16 16:45, Borislav Petkov wrote: > > On Mon, Feb 08, 2016 at 04:38:40PM +0000, Andrew Cooper wrote: > >> Does the early loader have extable support? If so, this is fairly easy > >> to fix. If not, we have a problem. > > It doesn't and regardless, you want to have this CPUID querying as > > simple as possible. No special handling, no special prefixes as it > > should be able to run on other hypervisors too. > > > > If one can't execute a simple CPUID(0x4...) on a xen guest and get the > > results back, then for early, we will have to do what we've done until > > now and simply emulate the MSR accesses. > > > > Later code can use then xen_cpuid() and all is fine. We should still get > > rid of paravirt_enabled() though. > > > > The force emulation prefix starts with a ud2a instruction, so extable is > to prevent it breaking on non-Xen systems. However, if extable isn't > available, this point is moot. > > As an alternative check which should be doable this early on, peeking in > the head of hypercall_page should work. If Linux was booted as a PV > guest, the hypercall_page will have been constructed by the domain > builder, and won't have 0x90's in it. Most of the paravirt_enabled() checks can be replaced with a hardware_subarch check once we modify the xen_start_kernel() to add that, today XEN is unused even though it was added eons ago. Part of my work was to remove as many paravirt_enabled() checks. I'll reply to Andy's original e-mail now indicating which ones I could address, it sounds like with this nugget and some other work we might be able to address all. I should note, in the future the check for subarch would be an explicit part of the x86 early init init routines to avoid further issues but only in between x86_64_start_reservations() and setup_arch(). How *early* can such a hypercall _page check be *safely* be called? I say safely here as if we're not on Xen are we OK to muck around and check the same address space? Provided we use the subarch for PV to remove a lot of the paravirt_enabled() checks, is HVMLite still OK if it ends up using PC subarch and there not being a paravirt_enabled() anyamore? Boris O's HVMLite series added paravirt_enabled = 1 for the new HVMLite. Luis
diff --git a/arch/x86/include/asm/hypervisor.h b/arch/x86/include/asm/hypervisor.h index 055ea9941dd5..f3e563e8b5c3 100644 --- a/arch/x86/include/asm/hypervisor.h +++ b/arch/x86/include/asm/hypervisor.h @@ -56,9 +56,11 @@ extern const struct hypervisor_x86 x86_hyper_kvm; extern void init_hypervisor(struct cpuinfo_x86 *c); extern void init_hypervisor_platform(void); extern bool hypervisor_x2apic_available(void); +bool is_xen_hypervisor(void); #else static inline void init_hypervisor(struct cpuinfo_x86 *c) { } static inline void init_hypervisor_platform(void) { } static inline bool hypervisor_x2apic_available(void) { return false; } +static inline bool is_xen_hypervisor(void) { return false; } #endif /* CONFIG_HYPERVISOR_GUEST */ #endif /* _ASM_X86_HYPERVISOR_H */ diff --git a/arch/x86/kernel/cpu/hypervisor.c b/arch/x86/kernel/cpu/hypervisor.c index d820d8eae96b..bda29017f946 100644 --- a/arch/x86/kernel/cpu/hypervisor.c +++ b/arch/x86/kernel/cpu/hypervisor.c @@ -85,3 +85,24 @@ bool __init hypervisor_x2apic_available(void) x86_hyper->x2apic_available && x86_hyper->x2apic_available(); } + +bool is_xen_hypervisor(void) +{ + u32 eax, ebx, ecx, edx; + + eax = 0x4000000; + ecx = 0; + native_cpuid(&eax, &ebx, &ecx, &edx); + + if (ebx == 0x566e6558 && ecx == 0x65584d4d && edx == 0x4d4d566e) + return true; + + eax = 0x40000100; + ecx = 0; + native_cpuid(&eax, &ebx, &ecx, &edx); + + if (ebx == 0x566e6558 && ecx == 0x65584d4d && edx == 0x4d4d566e) + return true; + + return false; +} diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index cea8552e2b3a..0a941ff8095c 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -35,6 +35,7 @@ #include <asm/microcode_intel.h> #include <asm/cpu_device_id.h> #include <asm/microcode_amd.h> +#include <asm/hypervisor.h> #include <asm/perf_event.h> #include <asm/microcode.h> #include <asm/processor.h> @@ -86,7 +87,7 @@ static bool __init check_loader_disabled_bsp(void) bool *res = &dis_ucode_ldr; #endif - if (cmdline_find_option_bool(cmdline, option)) + if (cmdline_find_option_bool(cmdline, option) || is_xen_hypervisor()) *res = true; return *res;