Message ID | 1454733014-15237-4-git-send-email-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Feb 5, 2016 8:30 PM, "Luis R. Rodriguez" <mcgrof@kernel.org> wrote: > > paravirt_enabled conveys the idea that if this is set or if > paravirt_enabled() returns true you are in a paravirtualized > environment. This is not true by any means, and left as-is > is just causing confusion and is prone to be misused and abused. > > This primitive is really only useful to determine if you have a > paravirtualization hypervisor that supports legacy paravirtualized > guests. At run time, this tells us if we've booted into a Linux guest > with support for legacy devices and features. > > To avoid further issues with semantics on this we loosely borrow > the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot > Architecture Flags" section and the PC 2001 definition in the PC > Systems design guide [0]: > > paravirt_legacy() is true if this hypervisor supports legacy > x86 paravirtualized guests. This needs to be far more concrete. I'm reasonably well versed in x86 details relevant to kernels ans I have *no clue* what your semantics mean. > +/** > + * struct pv_info - paravirt hypervisor information > + * > + * @supports_x86_legacy: true if this hypervisor supports legacy x86 > + * paravirtualized guests. The definition of legacy here adheres > + * *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot > + * Architecture Flags" section and the PC 2001 "legacy free" concept [1] > + * referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50 > + * [4]. Legacy x86 guests systems are guest systems which are not "legacy > + * free" as per the PC 2001 definition, and in the ACPI sense could have > + * any of the legacy ACPI IA-PC Boot architecture flags set. These are x86 > + * systems with any type of legacy peripherals or requirements. > + * > + * Examples of some popular legacy peripherals: > + * > + * a) Floppy drive > + * b) Legacy ports [1] such as such as parallel ports, PS/2 connectors, > + * serial ports / RS-232, game ports Parallel ATA, and IEEE 1394 > + * c) ISA bus > + * > + * Examples of features required to support such type of legacy guests > + * are the need for APM and a PNP BIOS. Seriously? I think you just defined every standard native x86 system as well as QEMU/KVM as "legacy". Can we just enumerate this crap? I propose: Xen PV and lguest are paravirt_legacy. Nothing else is paravirt_legacy. The addition of new paravirt_legacy support is strongly discouraged. --Andy
On Fri, Feb 05, 2016 at 11:11:34PM -0800, Andy Lutomirski wrote: > On Feb 5, 2016 8:30 PM, "Luis R. Rodriguez" <mcgrof@kernel.org> wrote: > > > > paravirt_enabled conveys the idea that if this is set or if > > paravirt_enabled() returns true you are in a paravirtualized > > environment. This is not true by any means, and left as-is > > is just causing confusion and is prone to be misused and abused. > > > > This primitive is really only useful to determine if you have a > > paravirtualization hypervisor that supports legacy paravirtualized > > guests. At run time, this tells us if we've booted into a Linux guest > > with support for legacy devices and features. > > > > To avoid further issues with semantics on this we loosely borrow > > the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot > > Architecture Flags" section and the PC 2001 definition in the PC > > Systems design guide [0]: > > > > paravirt_legacy() is true if this hypervisor supports legacy > > x86 paravirtualized guests. > > This needs to be far more concrete. I'm reasonably well versed in x86 > details relevant to kernels ans I have *no clue* what your semantics > mean. Interesting. I'm glad you're chiming in :) > > +/** > > + * struct pv_info - paravirt hypervisor information > > + * > > + * @supports_x86_legacy: true if this hypervisor supports legacy x86 > > + * paravirtualized guests. The definition of legacy here adheres > > + * *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot > > + * Architecture Flags" section and the PC 2001 "legacy free" concept [1] > > + * referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50 > > + * [4]. Legacy x86 guests systems are guest systems which are not "legacy > > + * free" as per the PC 2001 definition, and in the ACPI sense could have > > + * any of the legacy ACPI IA-PC Boot architecture flags set. These are x86 > > + * systems with any type of legacy peripherals or requirements. > > + * > > + * Examples of some popular legacy peripherals: > > + * > > + * a) Floppy drive > > + * b) Legacy ports [1] such as such as parallel ports, PS/2 connectors, > > + * serial ports / RS-232, game ports Parallel ATA, and IEEE 1394 > > + * c) ISA bus > > + * > > + * Examples of features required to support such type of legacy guests > > + * are the need for APM and a PNP BIOS. > > Seriously? > > I think you just defined every standard native x86 system > as well as QEMU/KVM as "legacy". I was a afraid it was too broad, but its why I used "loosely". > Can we just enumerate this crap? I propose: > > Xen PV and lguest are paravirt_legacy. Nothing else is > paravirt_legacy. Fine by me! > The addition of new paravirt_legacy support is > strongly discouraged. Fine by me, but Boris is re-using that for HVMLite on his recently proposed patches. Do we want that ? I'll also note that the goal is to ensure its hardware_subarch will be 0 (PC), meanwhile old PV will be Xen (although this hasn't been set yet). This mess is part of the reason why I do think stronger semantics and clearer definitions will help here. Without strong semantics I can't help address the dead code concerns I've been rambling over. HVMLite is just a rebranding for PVH done right, and then it seems PVH will be dropped and we'll have HVMLite rebranded back to PVH I guess it seems. The enumeration of legacy crap by ACPI boot flags seems to provide enough details to suit our needs if we really wanted to zero down on the specifics of what paravirt_legacy() means, there are these flags: /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */ #define ACPI_FADT_LEGACY_DEVICES (1) /* 00: [V2] System has LPC or ISA bus devices */ #define ACPI_FADT_8042 (1<<1) /* 01: [V3] System has an 8042 controller on port 60/64 */ #define ACPI_FADT_NO_VGA (1<<2) /* 02: [V4] It is not safe to probe for VGA hardware */ #define ACPI_FADT_NO_MSI (1<<3) /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */ #define ACPI_FADT_NO_ASPM (1<<4) /* 04: [V4] PCIe ASPM control must not be enabled */ #define ACPI_FADT_NO_CMOS_RTC (1<<5) /* 05: [V5] No CMOS real-time clock present */ I checked and I didn't see qemu using any of the ACPI boot flags, but I suspected qemu instances must use a series of legacy crap. Likewise for KVM. coreboot defines legacy free when you don't have any of the above flags set: #define ACPI_FADT_LEGACY_FREE 0x00 /* No legacy devices (including 8042) * Would it be sufficient to just stick to "pv legacy" equivalent of requiring just ACPI_FADT_LEGACY_DEVICES and ACPI_FADT_8042 ? I should point out It turns out ACPI_FADT_NO_CMOS_RTC matches lguest's and it seems that's the only reason we have that RTC PV flag and the features pv field... with the linker table + x86 subarch use it should be relatively simple to remove paravirt_has_feature() paravirt_has() and PV_SUPPORTED_RTC. lguest would just be the only subarch that opts out. Luis
On Sat, Feb 6, 2016 at 12:59 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote: > On Fri, Feb 05, 2016 at 11:11:34PM -0800, Andy Lutomirski wrote: >> On Feb 5, 2016 8:30 PM, "Luis R. Rodriguez" <mcgrof@kernel.org> wrote: >> > >> > paravirt_enabled conveys the idea that if this is set or if >> > paravirt_enabled() returns true you are in a paravirtualized >> > environment. This is not true by any means, and left as-is >> > is just causing confusion and is prone to be misused and abused. >> > >> > This primitive is really only useful to determine if you have a >> > paravirtualization hypervisor that supports legacy paravirtualized >> > guests. At run time, this tells us if we've booted into a Linux guest >> > with support for legacy devices and features. >> > >> > To avoid further issues with semantics on this we loosely borrow >> > the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot >> > Architecture Flags" section and the PC 2001 definition in the PC >> > Systems design guide [0]: >> > >> > paravirt_legacy() is true if this hypervisor supports legacy >> > x86 paravirtualized guests. >> >> This needs to be far more concrete. I'm reasonably well versed in x86 >> details relevant to kernels ans I have *no clue* what your semantics >> mean. > > Interesting. I'm glad you're chiming in :) > >> > +/** >> > + * struct pv_info - paravirt hypervisor information >> > + * >> > + * @supports_x86_legacy: true if this hypervisor supports legacy x86 >> > + * paravirtualized guests. The definition of legacy here adheres >> > + * *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot >> > + * Architecture Flags" section and the PC 2001 "legacy free" concept [1] >> > + * referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50 >> > + * [4]. Legacy x86 guests systems are guest systems which are not "legacy >> > + * free" as per the PC 2001 definition, and in the ACPI sense could have >> > + * any of the legacy ACPI IA-PC Boot architecture flags set. These are x86 >> > + * systems with any type of legacy peripherals or requirements. >> > + * >> > + * Examples of some popular legacy peripherals: >> > + * >> > + * a) Floppy drive >> > + * b) Legacy ports [1] such as such as parallel ports, PS/2 connectors, >> > + * serial ports / RS-232, game ports Parallel ATA, and IEEE 1394 >> > + * c) ISA bus >> > + * >> > + * Examples of features required to support such type of legacy guests >> > + * are the need for APM and a PNP BIOS. >> >> Seriously? >> >> I think you just defined every standard native x86 system >> as well as QEMU/KVM as "legacy". > > I was a afraid it was too broad, but its why I used "loosely". > >> Can we just enumerate this crap? I propose: >> >> Xen PV and lguest are paravirt_legacy. Nothing else is >> paravirt_legacy. > > Fine by me! > >> The addition of new paravirt_legacy support is >> strongly discouraged. > > Fine by me, but Boris is re-using that for HVMLite on his recently > proposed patches. Do we want that ? I'll also note that the goal > is to ensure its hardware_subarch will be 0 (PC), meanwhile old > PV will be Xen (although this hasn't been set yet). This mess is > part of the reason why I do think stronger semantics and clearer > definitions will help here. Without strong semantics I can't help > address the dead code concerns I've been rambling over. > > HVMLite is just a rebranding for PVH done right, and then it seems PVH will be > dropped and we'll have HVMLite rebranded back to PVH I guess it seems. PVH/HVMlite had better not be "paravirt" in this sense, I hope. > > The enumeration of legacy crap by ACPI boot flags seems to provide enough > details to suit our needs if we really wanted to zero down on the specifics of > what paravirt_legacy() means, there are these flags: > > /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */ > #define ACPI_FADT_LEGACY_DEVICES (1) /* 00: [V2] System has LPC or ISA bus devices */ > #define ACPI_FADT_8042 (1<<1) /* 01: [V3] System has an 8042 controller on port 60/64 */ > #define ACPI_FADT_NO_VGA (1<<2) /* 02: [V4] It is not safe to probe for VGA hardware */ > #define ACPI_FADT_NO_MSI (1<<3) /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */ > #define ACPI_FADT_NO_ASPM (1<<4) /* 04: [V4] PCIe ASPM control must not be enabled */ > #define ACPI_FADT_NO_CMOS_RTC (1<<5) /* 05: [V5] No CMOS real-time clock present */ > > I checked and I didn't see qemu using any of the ACPI boot flags, > but I suspected qemu instances must use a series of legacy crap. > Likewise for KVM. So shouldn't Linux find out things from the FADT by reading the FADT rather than squishing them into a confusing single function? Anyway, this is all ridiculous. I propose that rather than trying to clean up paravirt_enabled, you just delete it. Here are its users: static inline bool is_hypervisor_range(int idx) { /* * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */ return paravirt_enabled() && (idx >= pgd_index(__PAGE_OFFSET) - 16) && (idx < pgd_index(__PAGE_OFFSET)); } Nope, wrong. I don't really know what this code is trying to do, but I'm pretty sure it's wrong. Did this mean to check "is Xen PV"? Or was it "is Xen PV or lgeust"? Or what? if (apm_info.bios.version == 0 || paravirt_enabled() || machine_is_olpc()) { printk(KERN_INFO "apm: BIOS not found.\n"); return -ENODEV; } I assume that is trying to avoid checking for APM on systems that are known to be too new. How about cleanup up the condition to check something sensible? if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) { static int f00f_workaround_enabled; [...] This is asking "are we the natively booted kernel?". This has nothing to do with paravirt in particular. How about just deleting that code? It seems pointless. Sure, it's the responsibility of the real root kernel, but nothing will break if a guest kernel also does the fixup. int __init microcode_init(void) { [...] if (paravirt_enabled() || dis_ucode_ldr) return -EINVAL; This is also asking "are we the natively booted kernel?" This is plausibly useful for real. (Borislav, is this actually necessary?) Seems to me there should be a function is_native_root_kernel() or similar. Obviously it could have false positives and code will have to deal with that. (This also could be entirely wrong. What code is responsible for CPU microcode updates on Xen? For all I know, dom0 is *supposed* to apply microcode updates, in which case that check really should be deleted. void __init reserve_ebda_region(void) { [...] if (paravirt_enabled()) return; I don't know what the point of this one is. pnpbios turns itself off if paravirt_enabled(). I'm not convinced that's correct. /* only a natively booted kernel should be using TXT */ if (paravirt_enabled()) { pr_warning("non-0 tboot_addr but pv_ops is enabled\n"); return; } Er, what's wrong with trying to talk to tboot on paravirt? It won't be there unless something is rather wrong. In any case, this could use is_native_root_kernel(). if (paravirt_enabled() && !paravirt_has(RTC)) return -ENODEV; This actually seems legit. But how about reversing it: if paravirt_has(NO_RTC) return -ENODEV? Problem solved. paravirt_enabled is also used in entry_32.S: cmpl $0, pv_info+PARAVIRT_enabled This is actually trying to check whether pv_cpu_ops.iret == native_iret. I sincerely hope that no additional support is *ever* added to x86 Linux for systems on which this is not the case. So yeah, I think the right solution is to delete paravirt_enabled. --Andy
On Sat, Feb 06, 2016 at 12:05:32PM -0800, Andy Lutomirski wrote: > int __init microcode_init(void) > { > [...] > if (paravirt_enabled() || dis_ucode_ldr) > return -EINVAL; > > This is also asking "are we the natively booted kernel?" This is > plausibly useful for real. (Borislav, is this actually necessary?) There was some breakage on 32-bit pvops with that. > Seems to me there should be a function is_native_root_kernel() or > similar. Obviously it could have false positives and code will have > to deal with that. (This also could be entirely wrong. What code is > responsible for CPU microcode updates on Xen? For all I know, dom0 is > *supposed* to apply microcode updates, in which case that check really > should be deleted. So there are two aspects: - the guest loading the microcode driver. Xen should behave like qemu+kvm does: emulate the MSR accesses the microcode loader does. - microcode application on Xen: we've had this before. The hypervisor should do that (if it doesn't do so already). So yes, that paravirt_enabled() thing should go away. Even more so if we have CPUID leaf 0x4... reserved for hypervisors.
On 02/06/2016 03:05 PM, Andy Lutomirski wrote: > > Anyway, this is all ridiculous. I propose that rather than trying to > clean up paravirt_enabled, you just delete it. Here are its users: > > static inline bool is_hypervisor_range(int idx) > { > /* > * ffff800000000000 - ffff87ffffffffff is reserved for > * the hypervisor. > */ > return paravirt_enabled() && > (idx >= pgd_index(__PAGE_OFFSET) - 16) && > (idx < pgd_index(__PAGE_OFFSET)); > } > > Nope, wrong. I don't really know what this code is trying to do, but > I'm pretty sure it's wrong. Did this mean to check "is Xen PV"? Or > was it "is Xen PV or lgeust"? Or what? This range is reserved for hypervisors but the only hypervisor that uses it is Xen PV (lguest doesn't run in 64-bit mode). The check is intended to catch mappings on baremetal kernels that shouldn't be there. In other words it's not Xen PV that needs it, it's baremetal that wants to catch errors. > > if (apm_info.bios.version == 0 || paravirt_enabled() || > machine_is_olpc()) { > printk(KERN_INFO "apm: BIOS not found.\n"); > return -ENODEV; > } > > I assume that is trying to avoid checking for APM on systems that are > known to be too new. How about cleanup up the condition to check > something sensible? The check here I suspect is unnecessary since version is taken from boot_params and thus should be zero for Xen PV (don't know about lguest but if it's not then we could just set it there). > > if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) { > static int f00f_workaround_enabled; > [...] > > This is asking "are we the natively booted kernel?". This has nothing > to do with paravirt in particular. How about just deleting that code? > It seems pointless. Sure, it's the responsibility of the real root > kernel, but nothing will break if a guest kernel also does the fixup. Yes, I also think so. > > int __init microcode_init(void) > { > [...] > if (paravirt_enabled() || dis_ucode_ldr) > return -EINVAL; > > This is also asking "are we the natively booted kernel?" This is > plausibly useful for real. (Borislav, is this actually necessary?) > Seems to me there should be a function is_native_root_kernel() or > similar. Obviously it could have false positives and code will have > to deal with that. (This also could be entirely wrong. What code is > responsible for CPU microcode updates on Xen? For all I know, dom0 is > *supposed* to apply microcode updates, in which case that check really > should be deleted. > > void __init reserve_ebda_region(void) > { > [...] > if (paravirt_enabled()) > return; > > I don't know what the point of this one is. Not sure about this one neither. > > pnpbios turns itself off if paravirt_enabled(). I'm not convinced > that's correct. > > /* only a natively booted kernel should be using TXT */ > if (paravirt_enabled()) { > pr_warning("non-0 tboot_addr but pv_ops is enabled\n"); > return; > } > > Er, what's wrong with trying to talk to tboot on paravirt? It won't > be there unless something is rather wrong. In any case, this could > use is_native_root_kernel(). As in apm_info case, I don't think this check is necessary. > > if (paravirt_enabled() && !paravirt_has(RTC)) > return -ENODEV; > > This actually seems legit. But how about reversing it: if > paravirt_has(NO_RTC) return -ENODEV? Problem solved. > > paravirt_enabled is also used in entry_32.S: > > cmpl $0, pv_info+PARAVIRT_enabled > > This is actually trying to check whether pv_cpu_ops.iret == > native_iret. I sincerely hope that no additional support is *ever* > added to x86 Linux for systems on which this is not the case. I think we can use ALTERNATIVE(... X86_FEATURE_XENPV) here. -boris > > So yeah, I think the right solution is to delete paravirt_enabled. > > --Andy
On 02/06/2016 05:04 PM, Borislav Petkov wrote: > On Sat, Feb 06, 2016 at 12:05:32PM -0800, Andy Lutomirski wrote: >> int __init microcode_init(void) >> { >> [...] >> if (paravirt_enabled() || dis_ucode_ldr) >> return -EINVAL; >> >> This is also asking "are we the natively booted kernel?" This is >> plausibly useful for real. (Borislav, is this actually necessary?) > There was some breakage on 32-bit pvops with that. > >> Seems to me there should be a function is_native_root_kernel() or >> similar. Obviously it could have false positives and code will have >> to deal with that. (This also could be entirely wrong. What code is >> responsible for CPU microcode updates on Xen? For all I know, dom0 is >> *supposed* to apply microcode updates, in which case that check really >> should be deleted. > So there are two aspects: > > - the guest loading the microcode driver. Xen should behave like > qemu+kvm does: emulate the MSR accesses the microcode loader does. It does. Very much IIRC, the problem was not caused by an access to MSR but rather some sort of address not being available somewhere. > > - microcode application on Xen: we've had this before. The hypervisor > should do that (if it doesn't do so already). it does. > > So yes, that paravirt_enabled() thing should go away. Even more so if we > have CPUID leaf 0x4... reserved for hypervisors. I actually think this was the original proposal until we realized we had paravirt_enabled(). So we can go back to checking CPUID 0x40000000. We might also be able to test for (x86_hyper!=NULL) and have guests that do microcode management prior to init_hypervisor() rely on hypervisors ignoring MSR accesses (as they do today). -boris
On Mon, Feb 08, 2016 at 10:31:36AM -0500, Boris Ostrovsky wrote: > This range is reserved for hypervisors but the only hypervisor that uses it > is Xen PV (lguest doesn't run in 64-bit mode). Yeah, this is mentioned in arch/x86/include/asm/page_64_types.h: /* * Set __PAGE_OFFSET to the most negative possible address + * PGDIR_SIZE*16 (pgd slot 272). The gap is to allow a space for a * hypervisor to fit. Choosing 16 slots here is arbitrary, but it's * what Xen requires. */ #define __PAGE_OFFSET _AC(0xffff880000000000, UL) It is also in Documentation/x86/x86_64/mm.txt: ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor albeit a bit laconic. Thanks.
On Mon, Feb 08, 2016 at 10:39:43AM -0500, Boris Ostrovsky wrote: > It does. Very much IIRC, the problem was not caused by an access to MSR but > rather some sort of address not being available somewhere. See below. > >- microcode application on Xen: we've had this before. The hypervisor > >should do that (if it doesn't do so already). > > it does. Good. > >So yes, that paravirt_enabled() thing should go away. Even more so if we > >have CPUID leaf 0x4... reserved for hypervisors. > > I actually think this was the original proposal until we realized we had > paravirt_enabled(). So we can go back to checking CPUID 0x40000000. > > We might also be able to test for (x86_hyper!=NULL) and have guests that do > microcode management prior to init_hypervisor() rely on hypervisors ignoring > MSR accesses (as they do today). Right, so the early loader can't do that as on 32-bit it runs even before paging has been enabled. So I *think* the thing with CPUID would be best. What does the xen hypervisor return in regs when I do CPUID(4)? I.e., how do I reliably detect it in the guest? I can whip up a quick patch and get rid of paravirt_enabled() while at it...
On 08/02/16 15:55, Borislav Petkov wrote: > On Mon, Feb 08, 2016 at 10:39:43AM -0500, Boris Ostrovsky wrote: >> It does. Very much IIRC, the problem was not caused by an access to MSR but >> rather some sort of address not being available somewhere. > See below. > >>> - microcode application on Xen: we've had this before. The hypervisor >>> should do that (if it doesn't do so already). >> it does. > Good. > >>> So yes, that paravirt_enabled() thing should go away. Even more so if we >>> have CPUID leaf 0x4... reserved for hypervisors. >> I actually think this was the original proposal until we realized we had >> paravirt_enabled(). So we can go back to checking CPUID 0x40000000. >> >> We might also be able to test for (x86_hyper!=NULL) and have guests that do >> microcode management prior to init_hypervisor() rely on hypervisors ignoring >> MSR accesses (as they do today). > Right, so the early loader can't do that as on 32-bit it runs even > before paging has been enabled. So I *think* the thing with CPUID would > be best. What does the xen hypervisor return in regs when I do CPUID(4)? > I.e., how do I reliably detect it in the guest? > > I can whip up a quick patch and get rid of paravirt_enabled() while at > it... > For compatibility with other virtualisation specs, Xen's cpuid leaves shift depending on configuration. Spec at http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33 Basically, they are either at 0x40000000, or 0x40000100 if viridian or vmware compatibility has been enabled. ~Andrew
On 02/08/2016 11:05 AM, Andrew Cooper wrote: > > For compatibility with other virtualisation specs, Xen's cpuid leaves > shift depending on configuration. > > Spec at > http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33 > > Basically, they are either at 0x40000000, or 0x40000100 if viridian or > vmware compatibility has been enabled. (Also in arch/x86/include/asm/xen/cpuid.h in Linux tree). -boris
On 08/02/16 16:12, Boris Ostrovsky wrote: > > > On 02/08/2016 11:05 AM, Andrew Cooper wrote: >> >> For compatibility with other virtualisation specs, Xen's cpuid leaves >> shift depending on configuration. >> >> Spec at >> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33 >> >> >> Basically, they are either at 0x40000000, or 0x40000100 if viridian or >> vmware compatibility has been enabled. > > (Also in arch/x86/include/asm/xen/cpuid.h in Linux tree). The other complication is that, for a PV guest, you must use the forced emulation prefix for the cpuid instruction to trap into the hypervisor. This either means you must be sure you are under Xen to start with, or use an extable entry to catch the #UD fault if you are not under Xen. ~Andrew
On 02/08/2016 11:26 AM, Andrew Cooper wrote: > On 08/02/16 16:12, Boris Ostrovsky wrote: >> >> On 02/08/2016 11:05 AM, Andrew Cooper wrote: >>> For compatibility with other virtualisation specs, Xen's cpuid leaves >>> shift depending on configuration. >>> >>> Spec at >>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33 >>> >>> >>> Basically, they are either at 0x40000000, or 0x40000100 if viridian or >>> vmware compatibility has been enabled. >> (Also in arch/x86/include/asm/xen/cpuid.h in Linux tree). > The other complication is that, for a PV guest, you must use the forced > emulation prefix for the cpuid instruction to trap into the hypervisor. > > This either means you must be sure you are under Xen to start with, or > use an extable entry to catch the #UD fault if you are not under Xen. I think we are OK for PV because this code will be executed after pvops are set and so we will be calling xen_cpuid(). -boris
On 08/02/16 16:31, Boris Ostrovsky wrote: > > > On 02/08/2016 11:26 AM, Andrew Cooper wrote: >> On 08/02/16 16:12, Boris Ostrovsky wrote: >>> >>> On 02/08/2016 11:05 AM, Andrew Cooper wrote: >>>> For compatibility with other virtualisation specs, Xen's cpuid leaves >>>> shift depending on configuration. >>>> >>>> Spec at >>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33 >>>> >>>> >>>> >>>> Basically, they are either at 0x40000000, or 0x40000100 if viridian or >>>> vmware compatibility has been enabled. >>> (Also in arch/x86/include/asm/xen/cpuid.h in Linux tree). >> The other complication is that, for a PV guest, you must use the forced >> emulation prefix for the cpuid instruction to trap into the hypervisor. >> >> This either means you must be sure you are under Xen to start with, or >> use an extable entry to catch the #UD fault if you are not under Xen. > > I think we are OK for PV because this code will be executed after > pvops are set and so we will be calling xen_cpuid(). Great. ~Andrew
On Mon, Feb 8, 2016 at 7:31 AM, Boris Ostrovsky <boris.ostrovsky@oracle.com> wrote: > > > On 02/06/2016 03:05 PM, Andy Lutomirski wrote: >> >> >> Anyway, this is all ridiculous. I propose that rather than trying to >> clean up paravirt_enabled, you just delete it. Here are its users: >> >> static inline bool is_hypervisor_range(int idx) >> { >> /* >> * ffff800000000000 - ffff87ffffffffff is reserved for >> * the hypervisor. >> */ >> return paravirt_enabled() && >> (idx >= pgd_index(__PAGE_OFFSET) - 16) && >> (idx < pgd_index(__PAGE_OFFSET)); >> } >> >> Nope, wrong. I don't really know what this code is trying to do, but >> I'm pretty sure it's wrong. Did this mean to check "is Xen PV"? Or >> was it "is Xen PV or lgeust"? Or what? > > > This range is reserved for hypervisors but the only hypervisor that uses it > is Xen PV (lguest doesn't run in 64-bit mode). > > The check is intended to catch mappings on baremetal kernels that shouldn't > be there. In other words it's not Xen PV that needs it, it's baremetal that > wants to catch errors. > > >> >> if (apm_info.bios.version == 0 || paravirt_enabled() || >> machine_is_olpc()) { >> printk(KERN_INFO "apm: BIOS not found.\n"); >> return -ENODEV; >> } >> >> I assume that is trying to avoid checking for APM on systems that are >> known to be too new. How about cleanup up the condition to check >> something sensible? > > > The check here I suspect is unnecessary since version is taken from > boot_params and thus should be zero for Xen PV (don't know about lguest but > if it's not then we could just set it there). > >> >> if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) { >> static int f00f_workaround_enabled; >> [...] >> >> This is asking "are we the natively booted kernel?". This has nothing >> to do with paravirt in particular. How about just deleting that code? >> It seems pointless. Sure, it's the responsibility of the real root >> kernel, but nothing will break if a guest kernel also does the fixup. > > > Yes, I also think so. > >> >> int __init microcode_init(void) >> { >> [...] >> if (paravirt_enabled() || dis_ucode_ldr) >> return -EINVAL; >> >> This is also asking "are we the natively booted kernel?" This is >> plausibly useful for real. (Borislav, is this actually necessary?) >> Seems to me there should be a function is_native_root_kernel() or >> similar. Obviously it could have false positives and code will have >> to deal with that. (This also could be entirely wrong. What code is >> responsible for CPU microcode updates on Xen? For all I know, dom0 is >> *supposed* to apply microcode updates, in which case that check really >> should be deleted. >> >> void __init reserve_ebda_region(void) >> { >> [...] >> if (paravirt_enabled()) >> return; >> >> I don't know what the point of this one is. > > > Not sure about this one neither. > >> >> pnpbios turns itself off if paravirt_enabled(). I'm not convinced >> that's correct. >> >> /* only a natively booted kernel should be using TXT */ >> if (paravirt_enabled()) { >> pr_warning("non-0 tboot_addr but pv_ops is enabled\n"); >> return; >> } >> >> Er, what's wrong with trying to talk to tboot on paravirt? It won't >> be there unless something is rather wrong. In any case, this could >> use is_native_root_kernel(). > > > As in apm_info case, I don't think this check is necessary. > >> >> if (paravirt_enabled() && !paravirt_has(RTC)) >> return -ENODEV; >> >> This actually seems legit. But how about reversing it: if >> paravirt_has(NO_RTC) return -ENODEV? Problem solved. >> >> paravirt_enabled is also used in entry_32.S: >> >> cmpl $0, pv_info+PARAVIRT_enabled >> >> This is actually trying to check whether pv_cpu_ops.iret == >> native_iret. I sincerely hope that no additional support is *ever* >> added to x86 Linux for systems on which this is not the case. > > > I think we can use ALTERNATIVE(... X86_FEATURE_XENPV) here. > I think we can do one better and rearrange the code a bit to look more like the 64-bit version. See: commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3 Author: Andy Lutomirski <luto@amacapital.net> Date: Wed Jul 23 08:34:11 2014 -0700 x86_64/entry/xen: Do not invoke espfix64 on Xen I'll give this a try soon. --Andy
On 02/06/2016 03:59 AM, Luis R. Rodriguez wrote: > The enumeration of legacy crap by ACPI boot flags seems to provide enough > details to suit our needs if we really wanted to zero down on the specifics of > what paravirt_legacy() means, there are these flags: > > /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */ > #define ACPI_FADT_LEGACY_DEVICES (1) /* 00: [V2] System has LPC or ISA bus devices */ > #define ACPI_FADT_8042 (1<<1) /* 01: [V3] System has an 8042 controller on port 60/64 */ > #define ACPI_FADT_NO_VGA (1<<2) /* 02: [V4] It is not safe to probe for VGA hardware */ > #define ACPI_FADT_NO_MSI (1<<3) /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */ > #define ACPI_FADT_NO_ASPM (1<<4) /* 04: [V4] PCIe ASPM control must not be enabled */ > #define ACPI_FADT_NO_CMOS_RTC (1<<5) /* 05: [V5] No CMOS real-time clock present */ > > I checked and I didn't see qemu using any of the ACPI boot flags, > but I suspected qemu instances must use a series of legacy crap. > Likewise for KVM. > > coreboot defines legacy free when you don't have any of the above flags set: > #define ACPI_FADT_LEGACY_FREE 0x00 /* No legacy devices (including 8042) * > > Would it be sufficient to just stick to "pv legacy" equivalent > of requiring just ACPI_FADT_LEGACY_DEVICES and ACPI_FADT_8042 ? I believe unprivileged PV guests may be compiled without CONFIG_ACPI if you are talking about requiring (or manually setting) these flags. -boris > > I should point out It turns out ACPI_FADT_NO_CMOS_RTC matches lguest's and it > seems that's the only reason we have that RTC PV flag and the features pv > field... with the linker table + x86 subarch use it should be relatively simple > to remove paravirt_has_feature() paravirt_has() and PV_SUPPORTED_RTC. lguest > would just be the only subarch that opts out. > > Luis
On Sat, Feb 06, 2016 at 12:05:32PM -0800, Andy Lutomirski wrote: > On Sat, Feb 6, 2016 at 12:59 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote: > > On Fri, Feb 05, 2016 at 11:11:34PM -0800, Andy Lutomirski wrote: > >> On Feb 5, 2016 8:30 PM, "Luis R. Rodriguez" <mcgrof@kernel.org> wrote: > >> > > >> > paravirt_enabled conveys the idea that if this is set or if > >> > paravirt_enabled() returns true you are in a paravirtualized > >> > environment. This is not true by any means, and left as-is > >> > is just causing confusion and is prone to be misused and abused. > >> > > >> > This primitive is really only useful to determine if you have a > >> > paravirtualization hypervisor that supports legacy paravirtualized > >> > guests. At run time, this tells us if we've booted into a Linux guest > >> > with support for legacy devices and features. > >> > > >> > To avoid further issues with semantics on this we loosely borrow > >> > the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot > >> > Architecture Flags" section and the PC 2001 definition in the PC > >> > Systems design guide [0]: > >> > > >> > paravirt_legacy() is true if this hypervisor supports legacy > >> > x86 paravirtualized guests. > >> > >> This needs to be far more concrete. I'm reasonably well versed in x86 > >> details relevant to kernels ans I have *no clue* what your semantics > >> mean. > > > > Interesting. I'm glad you're chiming in :) > > > >> > +/** > >> > + * struct pv_info - paravirt hypervisor information > >> > + * > >> > + * @supports_x86_legacy: true if this hypervisor supports legacy x86 > >> > + * paravirtualized guests. The definition of legacy here adheres > >> > + * *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot > >> > + * Architecture Flags" section and the PC 2001 "legacy free" concept [1] > >> > + * referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50 > >> > + * [4]. Legacy x86 guests systems are guest systems which are not "legacy > >> > + * free" as per the PC 2001 definition, and in the ACPI sense could have > >> > + * any of the legacy ACPI IA-PC Boot architecture flags set. These are x86 > >> > + * systems with any type of legacy peripherals or requirements. > >> > + * > >> > + * Examples of some popular legacy peripherals: > >> > + * > >> > + * a) Floppy drive > >> > + * b) Legacy ports [1] such as such as parallel ports, PS/2 connectors, > >> > + * serial ports / RS-232, game ports Parallel ATA, and IEEE 1394 > >> > + * c) ISA bus > >> > + * > >> > + * Examples of features required to support such type of legacy guests > >> > + * are the need for APM and a PNP BIOS. > >> > >> Seriously? > >> > >> I think you just defined every standard native x86 system > >> as well as QEMU/KVM as "legacy". > > > > I was a afraid it was too broad, but its why I used "loosely". > > > >> Can we just enumerate this crap? I propose: > >> > >> Xen PV and lguest are paravirt_legacy. Nothing else is > >> paravirt_legacy. > > > > Fine by me! > > > >> The addition of new paravirt_legacy support is > >> strongly discouraged. > > > > Fine by me, but Boris is re-using that for HVMLite on his recently > > proposed patches. Do we want that ? I'll also note that the goal > > is to ensure its hardware_subarch will be 0 (PC), meanwhile old > > PV will be Xen (although this hasn't been set yet). This mess is > > part of the reason why I do think stronger semantics and clearer > > definitions will help here. Without strong semantics I can't help > > address the dead code concerns I've been rambling over. > > > > HVMLite is just a rebranding for PVH done right, and then it seems PVH will be > > dropped and we'll have HVMLite rebranded back to PVH I guess it seems. > > PVH/HVMlite had better not be "paravirt" in this sense, I hope. Cc'ing David Vrabel so he's aware of this ongoing discussion, which relates to the HVMLite series. As far as I follow both this patch series discussion and the HVMLite patch series by Boris O, it doesn't sound like we yet have a solution for this given that as-is the patch series for HVMLite still uses paravirt_enabled = 1. Perhaps the discussion ongoing on this thread will lead us to a safe way to distinguish. > > The enumeration of legacy crap by ACPI boot flags seems to provide enough > > details to suit our needs if we really wanted to zero down on the specifics of > > what paravirt_legacy() means, there are these flags: > > > > /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */ > > #define ACPI_FADT_LEGACY_DEVICES (1) /* 00: [V2] System has LPC or ISA bus devices */ > > #define ACPI_FADT_8042 (1<<1) /* 01: [V3] System has an 8042 controller on port 60/64 */ > > #define ACPI_FADT_NO_VGA (1<<2) /* 02: [V4] It is not safe to probe for VGA hardware */ > > #define ACPI_FADT_NO_MSI (1<<3) /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */ > > #define ACPI_FADT_NO_ASPM (1<<4) /* 04: [V4] PCIe ASPM control must not be enabled */ > > #define ACPI_FADT_NO_CMOS_RTC (1<<5) /* 05: [V5] No CMOS real-time clock present */ > > > > I checked and I didn't see qemu using any of the ACPI boot flags, > > but I suspected qemu instances must use a series of legacy crap. > > Likewise for KVM. > > So shouldn't Linux find out things from the FADT by reading the FADT > rather than squishing them into a confusing single function? Boris O replied saying it seems some Xen PV guests can exist without ACPI, but I was in the same hope as you were as it would seem the architecturally sane way to do this. PV legacy should ultimately die, so I'm not too concerned over it, I think we just need a clean way to deprecate it without causing regressions, but also enabling the best architecture forward. > Anyway, this is all ridiculous. I propose that rather than trying to > clean up paravirt_enabled, you just delete it. Yay. > Here are its users: > > static inline bool is_hypervisor_range(int idx) > { > /* > * ffff800000000000 - ffff87ffffffffff is reserved for > * the hypervisor. > */ > return paravirt_enabled() && > (idx >= pgd_index(__PAGE_OFFSET) - 16) && > (idx < pgd_index(__PAGE_OFFSET)); > } > > Nope, wrong. I don't really know what this code is trying to do, but > I'm pretty sure it's wrong. Did this mean to check "is Xen PV"? Or > was it "is Xen PV or lgeust"? Or what? It seems Boris and others have replied to this and a solution to replace it is in sight without paravirt_enabled(). > if (apm_info.bios.version == 0 || paravirt_enabled() || > machine_is_olpc()) { > printk(KERN_INFO "apm: BIOS not found.\n"); > return -ENODEV; > } > > I assume that is trying to avoid checking for APM on systems that are > known to be too new. How about cleanup up the condition to check > something sensible? With my patch series I am pretty sure I can remove this check by relying on the subarch. > if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) { > static int f00f_workaround_enabled; > [...] > > This is asking "are we the natively booted kernel?". This has nothing > to do with paravirt in particular. How about just deleting that code? > It seems pointless. Sure, it's the responsibility of the real root > kernel, but nothing will break if a guest kernel also does the fixup. Likewise. This is avoiding the foof workaround on PV guests as the hypervisor would have done that itself. Also I was told by Konrad at the last Xen Developer summit that it seems like we might even be able to remove the foof workaround from upstream, but this was just speculation. We could not confirm it. If we need it I think I can address this with my work. > int __init microcode_init(void) > { > [...] > if (paravirt_enabled() || dis_ucode_ldr) > return -EINVAL; > > > This is also asking "are we the natively booted kernel?" This is > plausibly useful for real. (Borislav, is this actually necessary?) > Seems to me there should be a function is_native_root_kernel() or > similar. Obviously it could have false positives and code will have > to deal with that. (This also could be entirely wrong. What code is > responsible for CPU microcode updates on Xen? For all I know, dom0 is > *supposed* to apply microcode updates, in which case that check really > should be deleted. Likewise. I can remove it but it sounds like others can remove it through other means as replied to this thread. > void __init reserve_ebda_region(void) > { > [...] > if (paravirt_enabled()) > return; > > I don't know what the point of this one is. > > pnpbios turns itself off if paravirt_enabled(). I'm not convinced > that's correct. Either way my set of patches address this by relying on the subarch. > /* only a natively booted kernel should be using TXT */ > if (paravirt_enabled()) { > pr_warning("non-0 tboot_addr but pv_ops is enabled\n"); > return; > } > > Er, what's wrong with trying to talk to tboot on paravirt? It won't > be there unless something is rather wrong. In any case, this could > use is_native_root_kernel(). If really needed likewise, I think the subarch can be used... > > if (paravirt_enabled() && !paravirt_has(RTC)) > return -ENODEV; > > This actually seems legit. But how about reversing it: if > paravirt_has(NO_RTC) return -ENODEV? Problem solved. I actually want to remove paravirt_has(NO_RTC), it turns out that only lguest needs to avoid this, then we not only remove paravirt_has(NO_RTC) but the RTF flag and the features paravirt crap. I am pretty sure I can remove this by just relying on the subarch, LGUEST would just not opt-in for this. > paravirt_enabled is also used in entry_32.S: > > cmpl $0, pv_info+PARAVIRT_enabled > > This is actually trying to check whether pv_cpu_ops.iret == > native_iret. I sincerely hope that no additional support is *ever* > added to x86 Linux for systems on which this is not the case. Not sure yet as this is early code and that's where I'm having issues, and its why I've been asking for a simple x86 hypervisr type and seeing if we can just access it in really early code. > So yeah, I think the right solution is to delete paravirt_enabled. Lets do it. Given this is all work we all seem to agree I can just drop this rename for now, and the other 2 patches can be merged for now (or ignored, let me know, and then we can just drop things as we go forward with either the subarch for Xen being set and used with the linker table stuff, and/or with the other alternatives that are being discussed so far on this thread by others. As noted though, we still need the HVMLite paravirt_enabled() thing sorted out, if we drop it, will HVMLite have a way to ensure it also gets a bail on the same routines? Luis
On Mon, Feb 08, 2016 at 04:46:08PM +0100, Borislav Petkov wrote: > On Mon, Feb 08, 2016 at 10:31:36AM -0500, Boris Ostrovsky wrote: > > This range is reserved for hypervisors but the only hypervisor that uses it > > is Xen PV (lguest doesn't run in 64-bit mode). > > Yeah, this is mentioned in arch/x86/include/asm/page_64_types.h: > > /* > * Set __PAGE_OFFSET to the most negative possible address + > * PGDIR_SIZE*16 (pgd slot 272). The gap is to allow a space for a > * hypervisor to fit. Choosing 16 slots here is arbitrary, but it's > * what Xen requires. > */ > #define __PAGE_OFFSET _AC(0xffff880000000000, UL) > > It is also in Documentation/x86/x86_64/mm.txt: > > ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor > > albeit a bit laconic. It doesn't sound like we have a replacement strategy for this yet? If not, for PV we can rely on the subarch, but only once its set, I can take this on as part of my work if agreeable given I'm adding the use of the Xen subarch. If we want this in before linker table stuff gets fleshed out this I can just split out setting the subarch on xen_start_kernel() and replacing paravirt_enabled() where applicable. But -- what about HVMLite for this, given as-is currently HVMLite is to use PC subarch? Luis
On Mon, Feb 08, 2016 at 10:31:36AM -0500, Boris Ostrovsky wrote: > > > On 02/06/2016 03:05 PM, Andy Lutomirski wrote: > > > >Anyway, this is all ridiculous. I propose that rather than trying to > >clean up paravirt_enabled, you just delete it. Here are its users: > > > >static inline bool is_hypervisor_range(int idx) > >{ > > /* > > * ffff800000000000 - ffff87ffffffffff is reserved for > > * the hypervisor. > > */ > > return paravirt_enabled() && > > (idx >= pgd_index(__PAGE_OFFSET) - 16) && > > (idx < pgd_index(__PAGE_OFFSET)); > >} > > > >Nope, wrong. I don't really know what this code is trying to do, but > >I'm pretty sure it's wrong. Did this mean to check "is Xen PV"? Or > >was it "is Xen PV or lgeust"? Or what? > > This range is reserved for hypervisors but the only hypervisor that > uses it is Xen PV (lguest doesn't run in 64-bit mode). > > The check is intended to catch mappings on baremetal kernels that > shouldn't be there. In other words it's not Xen PV that needs it, > it's baremetal that wants to catch errors. OK this is being discussed. > > > > > > if (apm_info.bios.version == 0 || paravirt_enabled() || > >machine_is_olpc()) { > > printk(KERN_INFO "apm: BIOS not found.\n"); > > return -ENODEV; > > } > > > >I assume that is trying to avoid checking for APM on systems that are > >known to be too new. How about cleanup up the condition to check > >something sensible? > > The check here I suspect is unnecessary since version is taken from > boot_params and thus should be zero for Xen PV (don't know about > lguest but if it's not then we could just set it there). Great, but we won't know unless we can test both. Until then we need the check, but I'm noting a solution is to use the subarch and my series can address this. > > > > if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) { > > static int f00f_workaround_enabled; > > [...] > > > >This is asking "are we the natively booted kernel?". This has nothing > >to do with paravirt in particular. How about just deleting that code? > > It seems pointless. Sure, it's the responsibility of the real root > >kernel, but nothing will break if a guest kernel also does the fixup. > > Yes, I also think so. Think is very different than knowing, so without a test we won't know. Until we test it, we could safely replace this check with the subarch stuff. Both lguest and PV Xen would opt-out as they do now. > >int __init microcode_init(void) > >{ > > [...] > > if (paravirt_enabled() || dis_ucode_ldr) > > return -EINVAL; > > > >This is also asking "are we the natively booted kernel?" This is > >plausibly useful for real. (Borislav, is this actually necessary?) > >Seems to me there should be a function is_native_root_kernel() or > >similar. Obviously it could have false positives and code will have > >to deal with that. (This also could be entirely wrong. What code is > >responsible for CPU microcode updates on Xen? For all I know, dom0 is > >*supposed* to apply microcode updates, in which case that check really > >should be deleted. > > > >void __init reserve_ebda_region(void) > >{ > > [...] > > if (paravirt_enabled()) > > return; > > > >I don't know what the point of this one is. > > Not sure about this one neither. This is about the BIOS Etra BDA space that the is reserved, we reserve this on bare metal to give the OS access to this, but on a PV guest this is not emulated AFAICT -- we skip reserving this space as its not needed to do so for PV guests. This is what was explained to me when I asked about this during the session I put together over dead code concerns and I brought this one up. Konrad and Andrew might be able to chime in more. Anyway, with the subarch stuff I'm doing we can keep it and peg the check with the subarch. > > > >pnpbios turns itself off if paravirt_enabled(). I'm not convinced > >that's correct. > > > > /* only a natively booted kernel should be using TXT */ > > if (paravirt_enabled()) { > > pr_warning("non-0 tboot_addr but pv_ops is enabled\n"); > > return; > > } > > > >Er, what's wrong with trying to talk to tboot on paravirt? It won't > >be there unless something is rather wrong. In any case, this could > >use is_native_root_kernel(). > > As in apm_info case, I don't think this check is necessary. Again, *think* is different than knowing. If we want to keep it just opting out of it for Xen and lguest subarch should suffice here. > > > > if (paravirt_enabled() && !paravirt_has(RTC)) > > return -ENODEV; > > > >This actually seems legit. But how about reversing it: if > >paravirt_has(NO_RTC) return -ENODEV? Problem solved. > > > >paravirt_enabled is also used in entry_32.S: > > > >cmpl $0, pv_info+PARAVIRT_enabled > > > >This is actually trying to check whether pv_cpu_ops.iret == > >native_iret. I sincerely hope that no additional support is *ever* > >added to x86 Linux for systems on which this is not the case. > > I think we can use ALTERNATIVE(... X86_FEATURE_XENPV) here. Seems Andy is on this. Luis
OK so here's a wiki to keep track of progress of the difference uses: http://kernelnewbies.org/KernelProjects/remove-paravirt-enabled It seems we have a resolution one way or another for all except for the use on arch/x86/mm/dump_pagetables.c, is that right? So to be clear we should not merge more users of paravirt_enabled. Luis
diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S index 4c5228352744..6a248022549c 100644 --- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -395,7 +395,7 @@ ldt_ss: * is still available to implement the setting of the high * 16-bits in the INTERRUPT_RETURN paravirt-op. */ - cmpl $0, pv_info+PARAVIRT_enabled + cmpl $0, pv_info+PARAVIRT_legacy jne restore_nocheck #endif diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h index 6542aa99714b..b3885c1f2156 100644 --- a/arch/x86/include/asm/paravirt.h +++ b/arch/x86/include/asm/paravirt.h @@ -14,14 +14,14 @@ #include <linux/types.h> #include <linux/cpumask.h> -static inline bool paravirt_enabled(void) +static inline bool paravirt_legacy(void) { - return pv_info.paravirt_enabled; + return pv_info.supports_x86_legacy; } static inline bool paravirt_has_feature(unsigned int feature) { - WARN_ON_ONCE(!paravirt_enabled()); + WARN_ON_ONCE(!paravirt_legacy()); return !!(pv_info.features & feature); } diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h index de2382b023f2..b4094a57435d 100644 --- a/arch/x86/include/asm/paravirt_types.h +++ b/arch/x86/include/asm/paravirt_types.h @@ -61,6 +61,37 @@ struct paravirt_callee_save { }; /* general info */ + +/** + * struct pv_info - paravirt hypervisor information + * + * @supports_x86_legacy: true if this hypervisor supports legacy x86 + * paravirtualized guests. The definition of legacy here adheres + * *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot + * Architecture Flags" section and the PC 2001 "legacy free" concept [1] + * referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50 + * [4]. Legacy x86 guests systems are guest systems which are not "legacy + * free" as per the PC 2001 definition, and in the ACPI sense could have + * any of the legacy ACPI IA-PC Boot architecture flags set. These are x86 + * systems with any type of legacy peripherals or requirements. + * + * Examples of some popular legacy peripherals: + * + * a) Floppy drive + * b) Legacy ports [1] such as such as parallel ports, PS/2 connectors, + * serial ports / RS-232, game ports Parallel ATA, and IEEE 1394 + * c) ISA bus + * + * Examples of features required to support such type of legacy guests + * are the need for APM and a PNP BIOS. + * + * [0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf + * [1] https://en.wikipedia.org/wiki/Legacy-free_PC + * [2] https://en.wikipedia.org/wiki/PC_System_Design_Guide + * [3] http://tech-insider.org/windows/research/2000/1102.html + * [4] http://tech-insider.org/windows/research/acrobat/001102/03sys-2001.pdf + * [5] https://en.wikipedia.org/wiki/Legacy_port + */ struct pv_info { unsigned int kernel_rpl; int shared_kernel_pmd; @@ -69,8 +100,8 @@ struct pv_info { u16 extra_user_64bit_cs; /* __USER_CS if none */ #endif - bool paravirt_enabled; - unsigned int features; /* valid only if paravirt_enabled is set */ + bool supports_x86_legacy; + unsigned int features; /* valid only on pv x86 legacy systems */ const char *name; }; diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 5a8e7a61d5be..6052a038b27f 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -470,7 +470,7 @@ static inline unsigned long current_top_of_stack(void) #include <asm/paravirt.h> #else #define __cpuid native_cpuid -#define paravirt_enabled() false +#define paravirt_legacy() false #define paravirt_has(x) false static inline void load_sp0(struct tss_struct *tss, diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c index 052c9c3026cc..74a3e8ca2c6d 100644 --- a/arch/x86/kernel/apm_32.c +++ b/arch/x86/kernel/apm_32.c @@ -2267,7 +2267,7 @@ static int __init apm_init(void) dmi_check_system(apm_dmi_table); - if (apm_info.bios.version == 0 || paravirt_enabled() || machine_is_olpc()) { + if (apm_info.bios.version == 0 || paravirt_legacy() || machine_is_olpc()) { printk(KERN_INFO "apm: BIOS not found.\n"); return -ENODEV; } diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c index 84a7524b202c..f49b9cbf0561 100644 --- a/arch/x86/kernel/asm-offsets.c +++ b/arch/x86/kernel/asm-offsets.c @@ -59,7 +59,7 @@ void common(void) { #ifdef CONFIG_PARAVIRT BLANK(); - OFFSET(PARAVIRT_enabled, pv_info, paravirt_enabled); + OFFSET(PARAVIRT_legacy, pv_info, supports_x86_legacy); OFFSET(PARAVIRT_PATCH_pv_cpu_ops, paravirt_patch_template, pv_cpu_ops); OFFSET(PARAVIRT_PATCH_pv_irq_ops, paravirt_patch_template, pv_irq_ops); OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable); diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 6040bd71679f..aab7d031f612 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -220,7 +220,7 @@ static void intel_workarounds(struct cpuinfo_x86 *c) * The Quark is also family 5, but does not have the same bug. */ clear_cpu_bug(c, X86_BUG_F00F); - if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) { + if (!paravirt_legacy() && c->x86 == 5 && c->x86_model < 9) { static int f00f_workaround_enabled; set_cpu_bug(c, X86_BUG_F00F); diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c index faec7120c508..5137e06028b3 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -630,7 +630,7 @@ int __init microcode_init(void) struct cpuinfo_x86 *c = &boot_cpu_data; int error; - if (paravirt_enabled() || dis_ucode_ldr) + if (paravirt_legacy() || dis_ucode_ldr) return -EINVAL; if (c->x86_vendor == X86_VENDOR_INTEL) diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/head.c index 992f442ca155..279fad7288f8 100644 --- a/arch/x86/kernel/head.c +++ b/arch/x86/kernel/head.c @@ -38,7 +38,7 @@ void __init reserve_ebda_region(void) * that the paravirt case can handle memory setup * correctly, without our help. */ - if (paravirt_enabled()) + if (paravirt_legacy()) return; /* end of low (conventional) memory */ diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 5c717b247e1b..177773e75b4d 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -283,14 +283,7 @@ NOKPROBE_SYMBOL(do_async_page_fault); static void __init paravirt_ops_setup(void) { pv_info.name = "KVM"; - - /* - * KVM isn't paravirt in the sense of paravirt_enabled. A KVM - * guest kernel works like a bare metal kernel with additional - * features, and paravirt_enabled is about features that are - * missing. - */ - pv_info.paravirt_enabled = false; + pv_info.supports_x86_legacy = false; if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY)) pv_cpu_ops.io_delay = kvm_io_delay; diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c index 6b1f205a6ac7..a2ccb328c287 100644 --- a/arch/x86/kernel/paravirt.c +++ b/arch/x86/kernel/paravirt.c @@ -294,7 +294,7 @@ enum paravirt_lazy_mode paravirt_get_lazy_mode(void) struct pv_info pv_info = { .name = "bare hardware", - .paravirt_enabled = false, + .supports_x86_legacy = false, .kernel_rpl = 0, .shared_kernel_pmd = 1, /* Only used when CONFIG_X86_PAE is set */ diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c index 4af8d063fb36..1d20e3347906 100644 --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -200,7 +200,7 @@ static __init int add_rtc_cmos(void) } #endif - if (paravirt_enabled() && !paravirt_has(RTC)) + if (paravirt_legacy() && !paravirt_has(RTC)) return -ENODEV; platform_device_register(&rtc_device); diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c index 91a4496db434..87a5027ff242 100644 --- a/arch/x86/kernel/tboot.c +++ b/arch/x86/kernel/tboot.c @@ -75,7 +75,7 @@ void __init tboot_probe(void) } /* only a natively booted kernel should be using TXT */ - if (paravirt_enabled()) { + if (paravirt_legacy()) { pr_warning("non-0 tboot_addr but pv_ops is enabled\n"); return; } diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c index c6f302f6dedb..dc33156e4d53 100644 --- a/arch/x86/lguest/boot.c +++ b/arch/x86/lguest/boot.c @@ -1408,8 +1408,8 @@ __init void lguest_init(void) { /* We're under lguest. */ pv_info.name = "lguest"; - /* Paravirt is enabled. */ - pv_info.paravirt_enabled = true; + /* Hypervisor supports x86 legacy guest systems . */ + pv_info.supports_x86_legacy = true; /* We're running at privilege level 1, not 0 as normal. */ pv_info.kernel_rpl = 1; /* Everyone except Xen runs with this set. */ diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c index 4a6f1d9b5106..e9cc69c5ad54 100644 --- a/arch/x86/mm/dump_pagetables.c +++ b/arch/x86/mm/dump_pagetables.c @@ -365,7 +365,7 @@ static inline bool is_hypervisor_range(int idx) * ffff800000000000 - ffff87ffffffffff is reserved for * the hypervisor. */ - return paravirt_enabled() && + return paravirt_legacy() && (idx >= pgd_index(__PAGE_OFFSET) - 16) && (idx < pgd_index(__PAGE_OFFSET)); } diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index e303e0043881..2485a1613c2f 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -1186,7 +1186,7 @@ static unsigned xen_patch(u8 type, u16 clobbers, void *insnbuf, } static const struct pv_info xen_info __initconst = { - .paravirt_enabled = true, + .supports_x86_legacy = true, .shared_kernel_pmd = 0, #ifdef CONFIG_X86_64 diff --git a/drivers/pnp/pnpbios/core.c b/drivers/pnp/pnpbios/core.c index facd43b8516c..40557bc16005 100644 --- a/drivers/pnp/pnpbios/core.c +++ b/drivers/pnp/pnpbios/core.c @@ -521,7 +521,7 @@ static int __init pnpbios_init(void) int ret; if (pnpbios_disabled || dmi_check_system(pnpbios_dmi_table) || - paravirt_enabled()) { + paravirt_legacy()) { printk(KERN_INFO "PnPBIOS: Disabled\n"); return -ENODEV; }
paravirt_enabled conveys the idea that if this is set or if paravirt_enabled() returns true you are in a paravirtualized environment. This is not true by any means, and left as-is is just causing confusion and is prone to be misused and abused. This primitive is really only useful to determine if you have a paravirtualization hypervisor that supports legacy paravirtualized guests. At run time, this tells us if we've booted into a Linux guest with support for legacy devices and features. To avoid further issues with semantics on this we loosely borrow the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot Architecture Flags" section and the PC 2001 definition in the PC Systems design guide [0]: paravirt_legacy() is true if this hypervisor supports legacy x86 paravirtualized guests. Renaming the member and helper to make this clear and document this well. With proper documentation now we can avoiding special cased comments trying to explain what the heck this is. [0] http://tech-insider.org/windows/research/2000/1102.html [1] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf v2: * Fix 0-day bot build issue on arch/x86/entry/entry_32.S where I forgot to update the upper cas eoffset name, defined in arch/x86/kernel/asm-offsets.c * Add more documentation and references for what exactly is x86 legacy, and how this inspired the notion of a paravirt legacy device or feature. * Use supports_x86_legacy on the struct member to make it clearer what this bool is for, keep the paravirt_legacy() from v1. * Split out changes into a few patches to make it easier to review and test. The rename is done using the following Coccinelle SmPL patch: @ rename_paravirt_enabled @ @@ -paravirt_enabled() +paravirt_legacy() @ rename_pv_info_pv_enabled @ @@ -pv_info.paravirt_enabled +pv_info.supports_x86_legacy @ is_pv @ identifier pv; @@ struct pv_info pv = { }; @ rename_struct_pv_enabled depends on is_pv @ identifier is_pv.pv; expression val; @@ struct pv_info pv = { - .paravirt_enabled + .supports_x86_legacy = val, }; Generated-by: Coccinelle SmPL Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Robert Moore <robert.moore@intel.com> Cc: Fengguang Wu <fengguang.wu@intel.com Cc: rafael.j.wysocki@intel.com Cc: lv.zheng@intel.com Cc: david.e.box@intel.com Cc: cocci@systeme.lip6.fr cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org> --- arch/x86/entry/entry_32.S | 2 +- arch/x86/include/asm/paravirt.h | 6 +++--- arch/x86/include/asm/paravirt_types.h | 35 +++++++++++++++++++++++++++++++++-- arch/x86/include/asm/processor.h | 2 +- arch/x86/kernel/apm_32.c | 2 +- arch/x86/kernel/asm-offsets.c | 2 +- arch/x86/kernel/cpu/intel.c | 2 +- arch/x86/kernel/cpu/microcode/core.c | 2 +- arch/x86/kernel/head.c | 2 +- arch/x86/kernel/kvm.c | 9 +-------- arch/x86/kernel/paravirt.c | 2 +- arch/x86/kernel/rtc.c | 2 +- arch/x86/kernel/tboot.c | 2 +- arch/x86/lguest/boot.c | 4 ++-- arch/x86/mm/dump_pagetables.c | 2 +- arch/x86/xen/enlighten.c | 2 +- drivers/pnp/pnpbios/core.c | 2 +- 17 files changed, 52 insertions(+), 28 deletions(-)