diff mbox

[v2,3/3] paravirt: rename paravirt_enabled to paravirt_legacy

Message ID 1454733014-15237-4-git-send-email-mcgrof@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Luis Chamberlain Feb. 6, 2016, 4:30 a.m. UTC
paravirt_enabled conveys the idea that if this is set or if
paravirt_enabled() returns true you are in a paravirtualized
environment. This is not true by any means, and left as-is
is just causing confusion and is prone to be misused and abused.

This primitive is really only useful to determine if you have a
paravirtualization hypervisor that supports legacy paravirtualized
guests. At run time, this tells us if we've booted into a Linux guest
with support for legacy devices and features.

To avoid further issues with semantics on this we loosely borrow
the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot
Architecture Flags" section and the PC 2001 definition in the PC
Systems design guide [0]:

  paravirt_legacy() is true if this hypervisor supports legacy
                    x86 paravirtualized guests.

Renaming the member and helper to make this clear and document
this well. With proper documentation now we can avoiding special
cased comments trying to explain what the heck this is.

[0] http://tech-insider.org/windows/research/2000/1102.html
[1] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf

v2:

* Fix 0-day bot build issue on arch/x86/entry/entry_32.S
  where I forgot to update the upper cas eoffset name,
  defined in arch/x86/kernel/asm-offsets.c

* Add more documentation and references for what exactly is
  x86 legacy, and how this inspired the notion of a paravirt
  legacy device or feature.

* Use supports_x86_legacy on the struct member to make it
  clearer what this bool is for, keep the paravirt_legacy()
  from v1.

* Split out changes into a few patches to make it easier
  to review and test.

The rename is done using the following Coccinelle SmPL patch:

@ rename_paravirt_enabled @
@@

-paravirt_enabled()
+paravirt_legacy()

@ rename_pv_info_pv_enabled @
@@
-pv_info.paravirt_enabled
+pv_info.supports_x86_legacy

@ is_pv @
identifier pv;
@@
struct pv_info pv = {
};

@ rename_struct_pv_enabled depends on is_pv @
identifier is_pv.pv;
expression val;
@@

struct pv_info pv = {
-	.paravirt_enabled
+	.supports_x86_legacy
	= val,
};

Generated-by: Coccinelle SmPL
Suggested-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Robert Moore <robert.moore@intel.com>
Cc: Fengguang Wu <fengguang.wu@intel.com
Cc: rafael.j.wysocki@intel.com
Cc: lv.zheng@intel.com
Cc: david.e.box@intel.com
Cc: cocci@systeme.lip6.fr
cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
 arch/x86/entry/entry_32.S             |  2 +-
 arch/x86/include/asm/paravirt.h       |  6 +++---
 arch/x86/include/asm/paravirt_types.h | 35 +++++++++++++++++++++++++++++++++--
 arch/x86/include/asm/processor.h      |  2 +-
 arch/x86/kernel/apm_32.c              |  2 +-
 arch/x86/kernel/asm-offsets.c         |  2 +-
 arch/x86/kernel/cpu/intel.c           |  2 +-
 arch/x86/kernel/cpu/microcode/core.c  |  2 +-
 arch/x86/kernel/head.c                |  2 +-
 arch/x86/kernel/kvm.c                 |  9 +--------
 arch/x86/kernel/paravirt.c            |  2 +-
 arch/x86/kernel/rtc.c                 |  2 +-
 arch/x86/kernel/tboot.c               |  2 +-
 arch/x86/lguest/boot.c                |  4 ++--
 arch/x86/mm/dump_pagetables.c         |  2 +-
 arch/x86/xen/enlighten.c              |  2 +-
 drivers/pnp/pnpbios/core.c            |  2 +-
 17 files changed, 52 insertions(+), 28 deletions(-)

Comments

Andy Lutomirski Feb. 6, 2016, 7:11 a.m. UTC | #1
On Feb 5, 2016 8:30 PM, "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
>
> paravirt_enabled conveys the idea that if this is set or if
> paravirt_enabled() returns true you are in a paravirtualized
> environment. This is not true by any means, and left as-is
> is just causing confusion and is prone to be misused and abused.
>
> This primitive is really only useful to determine if you have a
> paravirtualization hypervisor that supports legacy paravirtualized
> guests. At run time, this tells us if we've booted into a Linux guest
> with support for legacy devices and features.
>
> To avoid further issues with semantics on this we loosely borrow
> the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot
> Architecture Flags" section and the PC 2001 definition in the PC
> Systems design guide [0]:
>
>   paravirt_legacy() is true if this hypervisor supports legacy
>                     x86 paravirtualized guests.

This needs to be far more concrete.  I'm reasonably well versed in x86
details relevant to kernels ans I have *no clue* what your semantics
mean.

> +/**
> + * struct pv_info - paravirt hypervisor information
> + *
> + * @supports_x86_legacy: true if this hypervisor supports legacy x86
> + *     paravirtualized guests.  The definition of legacy here adheres
> + *     *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot
> + *     Architecture Flags" section and the PC 2001 "legacy free" concept [1]
> + *     referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50
> + *     [4].  Legacy x86 guests systems are guest systems which are not "legacy
> + *     free" as per the PC 2001 definition, and in the ACPI sense could have
> + *     any of the legacy ACPI IA-PC Boot architecture flags set. These are x86
> + *     systems with any type of legacy peripherals or requirements.
> + *
> + *     Examples of some popular legacy peripherals:
> + *
> + *       a) Floppy drive
> + *       b) Legacy ports [1] such as such as parallel ports, PS/2 connectors,
> + *          serial ports / RS-232, game ports Parallel ATA, and IEEE 1394
> + *       c) ISA bus
> + *
> + *     Examples of features required to support such type of legacy guests
> + *     are the need for APM and a PNP BIOS.

Seriously?  I think you just defined every standard native x86 system
as well as QEMU/KVM as "legacy".

Can we just enumerate this crap?  I propose:

Xen PV and lguest are paravirt_legacy.  Nothing else is
paravirt_legacy.  The addition of new paravirt_legacy support is
strongly discouraged.

--Andy
Luis Chamberlain Feb. 6, 2016, 8:59 a.m. UTC | #2
On Fri, Feb 05, 2016 at 11:11:34PM -0800, Andy Lutomirski wrote:
> On Feb 5, 2016 8:30 PM, "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
> >
> > paravirt_enabled conveys the idea that if this is set or if
> > paravirt_enabled() returns true you are in a paravirtualized
> > environment. This is not true by any means, and left as-is
> > is just causing confusion and is prone to be misused and abused.
> >
> > This primitive is really only useful to determine if you have a
> > paravirtualization hypervisor that supports legacy paravirtualized
> > guests. At run time, this tells us if we've booted into a Linux guest
> > with support for legacy devices and features.
> >
> > To avoid further issues with semantics on this we loosely borrow
> > the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot
> > Architecture Flags" section and the PC 2001 definition in the PC
> > Systems design guide [0]:
> >
> >   paravirt_legacy() is true if this hypervisor supports legacy
> >                     x86 paravirtualized guests.
> 
> This needs to be far more concrete.  I'm reasonably well versed in x86
> details relevant to kernels ans I have *no clue* what your semantics
> mean.

Interesting. I'm glad you're chiming in :)

> > +/**
> > + * struct pv_info - paravirt hypervisor information
> > + *
> > + * @supports_x86_legacy: true if this hypervisor supports legacy x86
> > + *     paravirtualized guests.  The definition of legacy here adheres
> > + *     *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot
> > + *     Architecture Flags" section and the PC 2001 "legacy free" concept [1]
> > + *     referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50
> > + *     [4].  Legacy x86 guests systems are guest systems which are not "legacy
> > + *     free" as per the PC 2001 definition, and in the ACPI sense could have
> > + *     any of the legacy ACPI IA-PC Boot architecture flags set. These are x86
> > + *     systems with any type of legacy peripherals or requirements.
> > + *
> > + *     Examples of some popular legacy peripherals:
> > + *
> > + *       a) Floppy drive
> > + *       b) Legacy ports [1] such as such as parallel ports, PS/2 connectors,
> > + *          serial ports / RS-232, game ports Parallel ATA, and IEEE 1394
> > + *       c) ISA bus
> > + *
> > + *     Examples of features required to support such type of legacy guests
> > + *     are the need for APM and a PNP BIOS.
> 
> Seriously?
>
> I think you just defined every standard native x86 system
> as well as QEMU/KVM as "legacy".

I was a afraid it was too broad, but its why I used "loosely".

> Can we just enumerate this crap?  I propose:
> 
> Xen PV and lguest are paravirt_legacy.  Nothing else is
> paravirt_legacy.

Fine by me!

> The addition of new paravirt_legacy support is
> strongly discouraged.

Fine by me, but Boris is re-using that for HVMLite on his recently
proposed patches. Do we want that ? I'll also note that the goal
is to ensure its hardware_subarch will be 0 (PC), meanwhile old
PV will be Xen (although this hasn't been set yet). This mess is
part of the reason why I do think stronger semantics and clearer
definitions will help here. Without strong semantics I can't help
address the dead code concerns I've been rambling over.

HVMLite is just a rebranding for PVH done right, and then it seems PVH will be
dropped and we'll have HVMLite rebranded back to PVH I guess it seems.

The enumeration of legacy crap by ACPI boot flags seems to provide enough
details to suit our needs if we really wanted to zero down on the specifics of
what paravirt_legacy() means, there are these flags:

/* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */
#define ACPI_FADT_LEGACY_DEVICES    (1)         /* 00: [V2] System has LPC or ISA bus devices */
#define ACPI_FADT_8042              (1<<1)      /* 01: [V3] System has an 8042 controller on port 60/64 */
#define ACPI_FADT_NO_VGA            (1<<2)      /* 02: [V4] It is not safe to probe for VGA hardware */
#define ACPI_FADT_NO_MSI            (1<<3)      /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */
#define ACPI_FADT_NO_ASPM           (1<<4)      /* 04: [V4] PCIe ASPM control must not be enabled */
#define ACPI_FADT_NO_CMOS_RTC       (1<<5)      /* 05: [V5] No CMOS real-time clock present */

I checked and I didn't see qemu using any of the ACPI boot flags,
but I suspected qemu instances must use a series of legacy crap.
Likewise for KVM.

coreboot defines legacy free when you don't have any of the above flags set:
#define ACPI_FADT_LEGACY_FREE   0x00    /* No legacy devices (including 8042) *

Would it be sufficient to just stick to "pv legacy" equivalent
of requiring just ACPI_FADT_LEGACY_DEVICES and ACPI_FADT_8042 ?

I should point out It turns out ACPI_FADT_NO_CMOS_RTC matches lguest's and it
seems that's the only reason we have that RTC PV flag and the features pv
field... with the linker table + x86 subarch use it should be relatively simple
to remove paravirt_has_feature() paravirt_has() and PV_SUPPORTED_RTC. lguest
would just be the only subarch that opts out.

  Luis
Andy Lutomirski Feb. 6, 2016, 8:05 p.m. UTC | #3
On Sat, Feb 6, 2016 at 12:59 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> On Fri, Feb 05, 2016 at 11:11:34PM -0800, Andy Lutomirski wrote:
>> On Feb 5, 2016 8:30 PM, "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
>> >
>> > paravirt_enabled conveys the idea that if this is set or if
>> > paravirt_enabled() returns true you are in a paravirtualized
>> > environment. This is not true by any means, and left as-is
>> > is just causing confusion and is prone to be misused and abused.
>> >
>> > This primitive is really only useful to determine if you have a
>> > paravirtualization hypervisor that supports legacy paravirtualized
>> > guests. At run time, this tells us if we've booted into a Linux guest
>> > with support for legacy devices and features.
>> >
>> > To avoid further issues with semantics on this we loosely borrow
>> > the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot
>> > Architecture Flags" section and the PC 2001 definition in the PC
>> > Systems design guide [0]:
>> >
>> >   paravirt_legacy() is true if this hypervisor supports legacy
>> >                     x86 paravirtualized guests.
>>
>> This needs to be far more concrete.  I'm reasonably well versed in x86
>> details relevant to kernels ans I have *no clue* what your semantics
>> mean.
>
> Interesting. I'm glad you're chiming in :)
>
>> > +/**
>> > + * struct pv_info - paravirt hypervisor information
>> > + *
>> > + * @supports_x86_legacy: true if this hypervisor supports legacy x86
>> > + *     paravirtualized guests.  The definition of legacy here adheres
>> > + *     *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot
>> > + *     Architecture Flags" section and the PC 2001 "legacy free" concept [1]
>> > + *     referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50
>> > + *     [4].  Legacy x86 guests systems are guest systems which are not "legacy
>> > + *     free" as per the PC 2001 definition, and in the ACPI sense could have
>> > + *     any of the legacy ACPI IA-PC Boot architecture flags set. These are x86
>> > + *     systems with any type of legacy peripherals or requirements.
>> > + *
>> > + *     Examples of some popular legacy peripherals:
>> > + *
>> > + *       a) Floppy drive
>> > + *       b) Legacy ports [1] such as such as parallel ports, PS/2 connectors,
>> > + *          serial ports / RS-232, game ports Parallel ATA, and IEEE 1394
>> > + *       c) ISA bus
>> > + *
>> > + *     Examples of features required to support such type of legacy guests
>> > + *     are the need for APM and a PNP BIOS.
>>
>> Seriously?
>>
>> I think you just defined every standard native x86 system
>> as well as QEMU/KVM as "legacy".
>
> I was a afraid it was too broad, but its why I used "loosely".
>
>> Can we just enumerate this crap?  I propose:
>>
>> Xen PV and lguest are paravirt_legacy.  Nothing else is
>> paravirt_legacy.
>
> Fine by me!
>
>> The addition of new paravirt_legacy support is
>> strongly discouraged.
>
> Fine by me, but Boris is re-using that for HVMLite on his recently
> proposed patches. Do we want that ? I'll also note that the goal
> is to ensure its hardware_subarch will be 0 (PC), meanwhile old
> PV will be Xen (although this hasn't been set yet). This mess is
> part of the reason why I do think stronger semantics and clearer
> definitions will help here. Without strong semantics I can't help
> address the dead code concerns I've been rambling over.
>
> HVMLite is just a rebranding for PVH done right, and then it seems PVH will be
> dropped and we'll have HVMLite rebranded back to PVH I guess it seems.

PVH/HVMlite had better not be "paravirt" in this sense, I hope.

>
> The enumeration of legacy crap by ACPI boot flags seems to provide enough
> details to suit our needs if we really wanted to zero down on the specifics of
> what paravirt_legacy() means, there are these flags:
>
> /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */
> #define ACPI_FADT_LEGACY_DEVICES    (1)         /* 00: [V2] System has LPC or ISA bus devices */
> #define ACPI_FADT_8042              (1<<1)      /* 01: [V3] System has an 8042 controller on port 60/64 */
> #define ACPI_FADT_NO_VGA            (1<<2)      /* 02: [V4] It is not safe to probe for VGA hardware */
> #define ACPI_FADT_NO_MSI            (1<<3)      /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */
> #define ACPI_FADT_NO_ASPM           (1<<4)      /* 04: [V4] PCIe ASPM control must not be enabled */
> #define ACPI_FADT_NO_CMOS_RTC       (1<<5)      /* 05: [V5] No CMOS real-time clock present */
>
> I checked and I didn't see qemu using any of the ACPI boot flags,
> but I suspected qemu instances must use a series of legacy crap.
> Likewise for KVM.

So shouldn't Linux find out things from the FADT by reading the FADT
rather than squishing them into a confusing single function?

Anyway, this is all ridiculous.  I propose that rather than trying to
clean up paravirt_enabled, you just delete it.  Here are its users:

static inline bool is_hypervisor_range(int idx)
{
    /*
     * ffff800000000000 - ffff87ffffffffff is reserved for
     * the hypervisor.
     */
    return paravirt_enabled() &&
        (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
        (idx < pgd_index(__PAGE_OFFSET));
}

Nope, wrong.  I don't really know what this code is trying to do, but
I'm pretty sure it's wrong.  Did this mean to check "is Xen PV"?  Or
was it "is Xen PV or lgeust"?  Or what?

        if (apm_info.bios.version == 0 || paravirt_enabled() ||
machine_is_olpc()) {
                printk(KERN_INFO "apm: BIOS not found.\n");
                return -ENODEV;
        }

I assume that is trying to avoid checking for APM on systems that are
known to be too new.  How about cleanup up the condition to check
something sensible?

        if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
                static int f00f_workaround_enabled;
        [...]

This is asking "are we the natively booted kernel?".  This has nothing
to do with paravirt in particular.  How about just deleting that code?
 It seems pointless.  Sure, it's the responsibility of the real root
kernel, but nothing will break if a guest kernel also does the fixup.

int __init microcode_init(void)
{
        [...]
        if (paravirt_enabled() || dis_ucode_ldr)
                return -EINVAL;

This is also asking "are we the natively booted kernel?"  This is
plausibly useful for real.  (Borislav, is this actually necessary?)
Seems to me there should be a function is_native_root_kernel() or
similar.  Obviously it could have false positives and code will have
to deal with that.  (This also could be entirely wrong.  What code is
responsible for CPU microcode updates on Xen?  For all I know, dom0 is
*supposed* to apply microcode updates, in which case that check really
should be deleted.

void __init reserve_ebda_region(void)
{
         [...]
        if (paravirt_enabled())
                return;

I don't know what the point of this one is.

pnpbios turns itself off if paravirt_enabled().  I'm not convinced
that's correct.

        /* only a natively booted kernel should be using TXT */
        if (paravirt_enabled()) {
                pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
                return;
        }

Er, what's wrong with trying to talk to tboot on paravirt?  It won't
be there unless something is rather wrong.  In any case, this could
use is_native_root_kernel().

        if (paravirt_enabled() && !paravirt_has(RTC))
                return -ENODEV;

This actually seems legit.  But how about reversing it: if
paravirt_has(NO_RTC) return -ENODEV?  Problem solved.

paravirt_enabled is also used in entry_32.S:

cmpl    $0, pv_info+PARAVIRT_enabled

This is actually trying to check whether pv_cpu_ops.iret ==
native_iret.  I sincerely hope that no additional support is *ever*
added to x86 Linux for systems on which this is not the case.

So yeah, I think the right solution is to delete paravirt_enabled.

--Andy
Borislav Petkov Feb. 6, 2016, 10:04 p.m. UTC | #4
On Sat, Feb 06, 2016 at 12:05:32PM -0800, Andy Lutomirski wrote:
> int __init microcode_init(void)
> {
>         [...]
>         if (paravirt_enabled() || dis_ucode_ldr)
>                 return -EINVAL;
> 
> This is also asking "are we the natively booted kernel?"  This is
> plausibly useful for real.  (Borislav, is this actually necessary?)

There was some breakage on 32-bit pvops with that.

> Seems to me there should be a function is_native_root_kernel() or
> similar.  Obviously it could have false positives and code will have
> to deal with that.  (This also could be entirely wrong.  What code is
> responsible for CPU microcode updates on Xen?  For all I know, dom0 is
> *supposed* to apply microcode updates, in which case that check really
> should be deleted.

So there are two aspects:

- the guest loading the microcode driver. Xen should behave like
qemu+kvm does: emulate the MSR accesses the microcode loader does.

- microcode application on Xen: we've had this before. The hypervisor
should do that (if it doesn't do so already).

So yes, that paravirt_enabled() thing should go away. Even more so if we
have CPUID leaf 0x4... reserved for hypervisors.
Boris Ostrovsky Feb. 8, 2016, 3:31 p.m. UTC | #5
On 02/06/2016 03:05 PM, Andy Lutomirski wrote:
>
> Anyway, this is all ridiculous.  I propose that rather than trying to
> clean up paravirt_enabled, you just delete it.  Here are its users:
>
> static inline bool is_hypervisor_range(int idx)
> {
>      /*
>       * ffff800000000000 - ffff87ffffffffff is reserved for
>       * the hypervisor.
>       */
>      return paravirt_enabled() &&
>          (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
>          (idx < pgd_index(__PAGE_OFFSET));
> }
>
> Nope, wrong.  I don't really know what this code is trying to do, but
> I'm pretty sure it's wrong.  Did this mean to check "is Xen PV"?  Or
> was it "is Xen PV or lgeust"?  Or what?

This range is reserved for hypervisors but the only hypervisor that uses 
it is Xen PV (lguest doesn't run in 64-bit mode).

The check is intended to catch mappings on baremetal kernels that 
shouldn't be there. In other words it's not Xen PV that needs it, it's 
baremetal that wants to catch errors.


>
>          if (apm_info.bios.version == 0 || paravirt_enabled() ||
> machine_is_olpc()) {
>                  printk(KERN_INFO "apm: BIOS not found.\n");
>                  return -ENODEV;
>          }
>
> I assume that is trying to avoid checking for APM on systems that are
> known to be too new.  How about cleanup up the condition to check
> something sensible?

The check here I suspect is unnecessary since version is taken from 
boot_params and thus should be zero for Xen PV (don't know about lguest 
but if it's not then we could just set it there).

>
>          if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
>                  static int f00f_workaround_enabled;
>          [...]
>
> This is asking "are we the natively booted kernel?".  This has nothing
> to do with paravirt in particular.  How about just deleting that code?
>   It seems pointless.  Sure, it's the responsibility of the real root
> kernel, but nothing will break if a guest kernel also does the fixup.

Yes, I also think so.

>
> int __init microcode_init(void)
> {
>          [...]
>          if (paravirt_enabled() || dis_ucode_ldr)
>                  return -EINVAL;
>
> This is also asking "are we the natively booted kernel?"  This is
> plausibly useful for real.  (Borislav, is this actually necessary?)
> Seems to me there should be a function is_native_root_kernel() or
> similar.  Obviously it could have false positives and code will have
> to deal with that.  (This also could be entirely wrong.  What code is
> responsible for CPU microcode updates on Xen?  For all I know, dom0 is
> *supposed* to apply microcode updates, in which case that check really
> should be deleted.
>
> void __init reserve_ebda_region(void)
> {
>           [...]
>          if (paravirt_enabled())
>                  return;
>
> I don't know what the point of this one is.

Not sure about this one neither.

>
> pnpbios turns itself off if paravirt_enabled().  I'm not convinced
> that's correct.
>
>          /* only a natively booted kernel should be using TXT */
>          if (paravirt_enabled()) {
>                  pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
>                  return;
>          }
>
> Er, what's wrong with trying to talk to tboot on paravirt?  It won't
> be there unless something is rather wrong.  In any case, this could
> use is_native_root_kernel().

As in apm_info case, I don't think this check is necessary.

>
>          if (paravirt_enabled() && !paravirt_has(RTC))
>                  return -ENODEV;
>
> This actually seems legit.  But how about reversing it: if
> paravirt_has(NO_RTC) return -ENODEV?  Problem solved.
>
> paravirt_enabled is also used in entry_32.S:
>
> cmpl    $0, pv_info+PARAVIRT_enabled
>
> This is actually trying to check whether pv_cpu_ops.iret ==
> native_iret.  I sincerely hope that no additional support is *ever*
> added to x86 Linux for systems on which this is not the case.

I think we can use ALTERNATIVE(... X86_FEATURE_XENPV) here.

-boris

>
> So yeah, I think the right solution is to delete paravirt_enabled.
>
> --Andy
Boris Ostrovsky Feb. 8, 2016, 3:39 p.m. UTC | #6
On 02/06/2016 05:04 PM, Borislav Petkov wrote:
> On Sat, Feb 06, 2016 at 12:05:32PM -0800, Andy Lutomirski wrote:
>> int __init microcode_init(void)
>> {
>>          [...]
>>          if (paravirt_enabled() || dis_ucode_ldr)
>>                  return -EINVAL;
>>
>> This is also asking "are we the natively booted kernel?"  This is
>> plausibly useful for real.  (Borislav, is this actually necessary?)
> There was some breakage on 32-bit pvops with that.
>
>> Seems to me there should be a function is_native_root_kernel() or
>> similar.  Obviously it could have false positives and code will have
>> to deal with that.  (This also could be entirely wrong.  What code is
>> responsible for CPU microcode updates on Xen?  For all I know, dom0 is
>> *supposed* to apply microcode updates, in which case that check really
>> should be deleted.
> So there are two aspects:
>
> - the guest loading the microcode driver. Xen should behave like
> qemu+kvm does: emulate the MSR accesses the microcode loader does.

It does. Very much IIRC, the problem was not caused by an access to MSR 
but rather some sort of address not being available somewhere.

>
> - microcode application on Xen: we've had this before. The hypervisor
> should do that (if it doesn't do so already).

it does.

>
> So yes, that paravirt_enabled() thing should go away. Even more so if we
> have CPUID leaf 0x4... reserved for hypervisors.

I actually think this was the original proposal until we realized we had 
paravirt_enabled(). So we can go back to checking CPUID 0x40000000.

We might also be able to test for (x86_hyper!=NULL) and have guests that 
do microcode management prior to init_hypervisor() rely on hypervisors 
ignoring MSR accesses (as they do today).

-boris
Borislav Petkov Feb. 8, 2016, 3:46 p.m. UTC | #7
On Mon, Feb 08, 2016 at 10:31:36AM -0500, Boris Ostrovsky wrote:
> This range is reserved for hypervisors but the only hypervisor that uses it
> is Xen PV (lguest doesn't run in 64-bit mode).

Yeah, this is mentioned in arch/x86/include/asm/page_64_types.h:

/*
 * Set __PAGE_OFFSET to the most negative possible address +
 * PGDIR_SIZE*16 (pgd slot 272).  The gap is to allow a space for a
 * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
 * what Xen requires.
 */
#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)

It is also in Documentation/x86/x86_64/mm.txt:

ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor

albeit a bit laconic.

Thanks.
Borislav Petkov Feb. 8, 2016, 3:55 p.m. UTC | #8
On Mon, Feb 08, 2016 at 10:39:43AM -0500, Boris Ostrovsky wrote:
> It does. Very much IIRC, the problem was not caused by an access to MSR but
> rather some sort of address not being available somewhere.

See below.

> >- microcode application on Xen: we've had this before. The hypervisor
> >should do that (if it doesn't do so already).
> 
> it does.

Good.

> >So yes, that paravirt_enabled() thing should go away. Even more so if we
> >have CPUID leaf 0x4... reserved for hypervisors.
> 
> I actually think this was the original proposal until we realized we had
> paravirt_enabled(). So we can go back to checking CPUID 0x40000000.
> 
> We might also be able to test for (x86_hyper!=NULL) and have guests that do
> microcode management prior to init_hypervisor() rely on hypervisors ignoring
> MSR accesses (as they do today).

Right, so the early loader can't do that as on 32-bit it runs even
before paging has been enabled. So I *think* the thing with CPUID would
be best. What does the xen hypervisor return in regs when I do CPUID(4)?
I.e., how do I reliably detect it in the guest?

I can whip up a quick patch and get rid of paravirt_enabled() while at
it...
Andrew Cooper Feb. 8, 2016, 4:05 p.m. UTC | #9
On 08/02/16 15:55, Borislav Petkov wrote:
> On Mon, Feb 08, 2016 at 10:39:43AM -0500, Boris Ostrovsky wrote:
>> It does. Very much IIRC, the problem was not caused by an access to MSR but
>> rather some sort of address not being available somewhere.
> See below.
>
>>> - microcode application on Xen: we've had this before. The hypervisor
>>> should do that (if it doesn't do so already).
>> it does.
> Good.
>
>>> So yes, that paravirt_enabled() thing should go away. Even more so if we
>>> have CPUID leaf 0x4... reserved for hypervisors.
>> I actually think this was the original proposal until we realized we had
>> paravirt_enabled(). So we can go back to checking CPUID 0x40000000.
>>
>> We might also be able to test for (x86_hyper!=NULL) and have guests that do
>> microcode management prior to init_hypervisor() rely on hypervisors ignoring
>> MSR accesses (as they do today).
> Right, so the early loader can't do that as on 32-bit it runs even
> before paging has been enabled. So I *think* the thing with CPUID would
> be best. What does the xen hypervisor return in regs when I do CPUID(4)?
> I.e., how do I reliably detect it in the guest?
>
> I can whip up a quick patch and get rid of paravirt_enabled() while at
> it...
>

For compatibility with other virtualisation specs, Xen's cpuid leaves
shift depending on configuration.

Spec at
http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33

Basically, they are either at 0x40000000, or 0x40000100 if viridian or
vmware compatibility has been enabled.

~Andrew
Boris Ostrovsky Feb. 8, 2016, 4:12 p.m. UTC | #10
On 02/08/2016 11:05 AM, Andrew Cooper wrote:
>
> For compatibility with other virtualisation specs, Xen's cpuid leaves
> shift depending on configuration.
>
> Spec at
> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33
>
> Basically, they are either at 0x40000000, or 0x40000100 if viridian or
> vmware compatibility has been enabled.

(Also in arch/x86/include/asm/xen/cpuid.h in Linux tree).

-boris
Andrew Cooper Feb. 8, 2016, 4:26 p.m. UTC | #11
On 08/02/16 16:12, Boris Ostrovsky wrote:
>
>
> On 02/08/2016 11:05 AM, Andrew Cooper wrote:
>>
>> For compatibility with other virtualisation specs, Xen's cpuid leaves
>> shift depending on configuration.
>>
>> Spec at
>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33
>>
>>
>> Basically, they are either at 0x40000000, or 0x40000100 if viridian or
>> vmware compatibility has been enabled.
>
> (Also in arch/x86/include/asm/xen/cpuid.h in Linux tree).

The other complication is that, for a PV guest, you must use the forced
emulation prefix for the cpuid instruction to trap into the hypervisor.

This either means you must be sure you are under Xen to start with, or
use an extable entry to catch the #UD fault if you are not under Xen.

~Andrew
Boris Ostrovsky Feb. 8, 2016, 4:31 p.m. UTC | #12
On 02/08/2016 11:26 AM, Andrew Cooper wrote:
> On 08/02/16 16:12, Boris Ostrovsky wrote:
>>
>> On 02/08/2016 11:05 AM, Andrew Cooper wrote:
>>> For compatibility with other virtualisation specs, Xen's cpuid leaves
>>> shift depending on configuration.
>>>
>>> Spec at
>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33
>>>
>>>
>>> Basically, they are either at 0x40000000, or 0x40000100 if viridian or
>>> vmware compatibility has been enabled.
>> (Also in arch/x86/include/asm/xen/cpuid.h in Linux tree).
> The other complication is that, for a PV guest, you must use the forced
> emulation prefix for the cpuid instruction to trap into the hypervisor.
>
> This either means you must be sure you are under Xen to start with, or
> use an extable entry to catch the #UD fault if you are not under Xen.

I think we are OK for PV because this code will be executed after pvops 
are set and so we will be calling xen_cpuid().

-boris
Andrew Cooper Feb. 8, 2016, 4:32 p.m. UTC | #13
On 08/02/16 16:31, Boris Ostrovsky wrote:
>
>
> On 02/08/2016 11:26 AM, Andrew Cooper wrote:
>> On 08/02/16 16:12, Boris Ostrovsky wrote:
>>>
>>> On 02/08/2016 11:05 AM, Andrew Cooper wrote:
>>>> For compatibility with other virtualisation specs, Xen's cpuid leaves
>>>> shift depending on configuration.
>>>>
>>>> Spec at
>>>> http://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=xen/include/public/arch-x86/cpuid.h;h=d709340f18d089560b959835eabb7b6609542c7e;hb=HEAD#l33
>>>>
>>>>
>>>>
>>>> Basically, they are either at 0x40000000, or 0x40000100 if viridian or
>>>> vmware compatibility has been enabled.
>>> (Also in arch/x86/include/asm/xen/cpuid.h in Linux tree).
>> The other complication is that, for a PV guest, you must use the forced
>> emulation prefix for the cpuid instruction to trap into the hypervisor.
>>
>> This either means you must be sure you are under Xen to start with, or
>> use an extable entry to catch the #UD fault if you are not under Xen.
>
> I think we are OK for PV because this code will be executed after
> pvops are set and so we will be calling xen_cpuid().

Great.

~Andrew
Andy Lutomirski Feb. 8, 2016, 9:04 p.m. UTC | #14
On Mon, Feb 8, 2016 at 7:31 AM, Boris Ostrovsky
<boris.ostrovsky@oracle.com> wrote:
>
>
> On 02/06/2016 03:05 PM, Andy Lutomirski wrote:
>>
>>
>> Anyway, this is all ridiculous.  I propose that rather than trying to
>> clean up paravirt_enabled, you just delete it.  Here are its users:
>>
>> static inline bool is_hypervisor_range(int idx)
>> {
>>      /*
>>       * ffff800000000000 - ffff87ffffffffff is reserved for
>>       * the hypervisor.
>>       */
>>      return paravirt_enabled() &&
>>          (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
>>          (idx < pgd_index(__PAGE_OFFSET));
>> }
>>
>> Nope, wrong.  I don't really know what this code is trying to do, but
>> I'm pretty sure it's wrong.  Did this mean to check "is Xen PV"?  Or
>> was it "is Xen PV or lgeust"?  Or what?
>
>
> This range is reserved for hypervisors but the only hypervisor that uses it
> is Xen PV (lguest doesn't run in 64-bit mode).
>
> The check is intended to catch mappings on baremetal kernels that shouldn't
> be there. In other words it's not Xen PV that needs it, it's baremetal that
> wants to catch errors.
>
>
>>
>>          if (apm_info.bios.version == 0 || paravirt_enabled() ||
>> machine_is_olpc()) {
>>                  printk(KERN_INFO "apm: BIOS not found.\n");
>>                  return -ENODEV;
>>          }
>>
>> I assume that is trying to avoid checking for APM on systems that are
>> known to be too new.  How about cleanup up the condition to check
>> something sensible?
>
>
> The check here I suspect is unnecessary since version is taken from
> boot_params and thus should be zero for Xen PV (don't know about lguest but
> if it's not then we could just set it there).
>
>>
>>          if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
>>                  static int f00f_workaround_enabled;
>>          [...]
>>
>> This is asking "are we the natively booted kernel?".  This has nothing
>> to do with paravirt in particular.  How about just deleting that code?
>>   It seems pointless.  Sure, it's the responsibility of the real root
>> kernel, but nothing will break if a guest kernel also does the fixup.
>
>
> Yes, I also think so.
>
>>
>> int __init microcode_init(void)
>> {
>>          [...]
>>          if (paravirt_enabled() || dis_ucode_ldr)
>>                  return -EINVAL;
>>
>> This is also asking "are we the natively booted kernel?"  This is
>> plausibly useful for real.  (Borislav, is this actually necessary?)
>> Seems to me there should be a function is_native_root_kernel() or
>> similar.  Obviously it could have false positives and code will have
>> to deal with that.  (This also could be entirely wrong.  What code is
>> responsible for CPU microcode updates on Xen?  For all I know, dom0 is
>> *supposed* to apply microcode updates, in which case that check really
>> should be deleted.
>>
>> void __init reserve_ebda_region(void)
>> {
>>           [...]
>>          if (paravirt_enabled())
>>                  return;
>>
>> I don't know what the point of this one is.
>
>
> Not sure about this one neither.
>
>>
>> pnpbios turns itself off if paravirt_enabled().  I'm not convinced
>> that's correct.
>>
>>          /* only a natively booted kernel should be using TXT */
>>          if (paravirt_enabled()) {
>>                  pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
>>                  return;
>>          }
>>
>> Er, what's wrong with trying to talk to tboot on paravirt?  It won't
>> be there unless something is rather wrong.  In any case, this could
>> use is_native_root_kernel().
>
>
> As in apm_info case, I don't think this check is necessary.
>
>>
>>          if (paravirt_enabled() && !paravirt_has(RTC))
>>                  return -ENODEV;
>>
>> This actually seems legit.  But how about reversing it: if
>> paravirt_has(NO_RTC) return -ENODEV?  Problem solved.
>>
>> paravirt_enabled is also used in entry_32.S:
>>
>> cmpl    $0, pv_info+PARAVIRT_enabled
>>
>> This is actually trying to check whether pv_cpu_ops.iret ==
>> native_iret.  I sincerely hope that no additional support is *ever*
>> added to x86 Linux for systems on which this is not the case.
>
>
> I think we can use ALTERNATIVE(... X86_FEATURE_XENPV) here.
>

I think we can do one better and rearrange the code a bit to look more
like the 64-bit version.  See:

commit 7209a75d2009dbf7745e2fd354abf25c3deb3ca3
Author: Andy Lutomirski <luto@amacapital.net>
Date:   Wed Jul 23 08:34:11 2014 -0700

    x86_64/entry/xen: Do not invoke espfix64 on Xen


I'll give this a try soon.

--Andy
Boris Ostrovsky Feb. 8, 2016, 9:49 p.m. UTC | #15
On 02/06/2016 03:59 AM, Luis R. Rodriguez wrote:
> The enumeration of legacy crap by ACPI boot flags seems to provide enough
> details to suit our needs if we really wanted to zero down on the specifics of
> what paravirt_legacy() means, there are these flags:
>
> /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */
> #define ACPI_FADT_LEGACY_DEVICES    (1)         /* 00: [V2] System has LPC or ISA bus devices */
> #define ACPI_FADT_8042              (1<<1)      /* 01: [V3] System has an 8042 controller on port 60/64 */
> #define ACPI_FADT_NO_VGA            (1<<2)      /* 02: [V4] It is not safe to probe for VGA hardware */
> #define ACPI_FADT_NO_MSI            (1<<3)      /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */
> #define ACPI_FADT_NO_ASPM           (1<<4)      /* 04: [V4] PCIe ASPM control must not be enabled */
> #define ACPI_FADT_NO_CMOS_RTC       (1<<5)      /* 05: [V5] No CMOS real-time clock present */
>
> I checked and I didn't see qemu using any of the ACPI boot flags,
> but I suspected qemu instances must use a series of legacy crap.
> Likewise for KVM.
>
> coreboot defines legacy free when you don't have any of the above flags set:
> #define ACPI_FADT_LEGACY_FREE   0x00    /* No legacy devices (including 8042) *
>
> Would it be sufficient to just stick to "pv legacy" equivalent
> of requiring just ACPI_FADT_LEGACY_DEVICES and ACPI_FADT_8042 ?

I believe unprivileged PV guests may be compiled without CONFIG_ACPI if 
you are talking about requiring (or manually setting) these flags.

-boris

>
> I should point out It turns out ACPI_FADT_NO_CMOS_RTC matches lguest's and it
> seems that's the only reason we have that RTC PV flag and the features pv
> field... with the linker table + x86 subarch use it should be relatively simple
> to remove paravirt_has_feature() paravirt_has() and PV_SUPPORTED_RTC. lguest
> would just be the only subarch that opts out.
>
>    Luis
Luis Chamberlain Feb. 9, 2016, 6:41 a.m. UTC | #16
On Sat, Feb 06, 2016 at 12:05:32PM -0800, Andy Lutomirski wrote:
> On Sat, Feb 6, 2016 at 12:59 AM, Luis R. Rodriguez <mcgrof@suse.com> wrote:
> > On Fri, Feb 05, 2016 at 11:11:34PM -0800, Andy Lutomirski wrote:
> >> On Feb 5, 2016 8:30 PM, "Luis R. Rodriguez" <mcgrof@kernel.org> wrote:
> >> >
> >> > paravirt_enabled conveys the idea that if this is set or if
> >> > paravirt_enabled() returns true you are in a paravirtualized
> >> > environment. This is not true by any means, and left as-is
> >> > is just causing confusion and is prone to be misused and abused.
> >> >
> >> > This primitive is really only useful to determine if you have a
> >> > paravirtualization hypervisor that supports legacy paravirtualized
> >> > guests. At run time, this tells us if we've booted into a Linux guest
> >> > with support for legacy devices and features.
> >> >
> >> > To avoid further issues with semantics on this we loosely borrow
> >> > the definition of "legacy" from both the ACPI 5.2.9.3 "IA-PC Boot
> >> > Architecture Flags" section and the PC 2001 definition in the PC
> >> > Systems design guide [0]:
> >> >
> >> >   paravirt_legacy() is true if this hypervisor supports legacy
> >> >                     x86 paravirtualized guests.
> >>
> >> This needs to be far more concrete.  I'm reasonably well versed in x86
> >> details relevant to kernels ans I have *no clue* what your semantics
> >> mean.
> >
> > Interesting. I'm glad you're chiming in :)
> >
> >> > +/**
> >> > + * struct pv_info - paravirt hypervisor information
> >> > + *
> >> > + * @supports_x86_legacy: true if this hypervisor supports legacy x86
> >> > + *     paravirtualized guests.  The definition of legacy here adheres
> >> > + *     *loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot
> >> > + *     Architecture Flags" section and the PC 2001 "legacy free" concept [1]
> >> > + *     referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50
> >> > + *     [4].  Legacy x86 guests systems are guest systems which are not "legacy
> >> > + *     free" as per the PC 2001 definition, and in the ACPI sense could have
> >> > + *     any of the legacy ACPI IA-PC Boot architecture flags set. These are x86
> >> > + *     systems with any type of legacy peripherals or requirements.
> >> > + *
> >> > + *     Examples of some popular legacy peripherals:
> >> > + *
> >> > + *       a) Floppy drive
> >> > + *       b) Legacy ports [1] such as such as parallel ports, PS/2 connectors,
> >> > + *          serial ports / RS-232, game ports Parallel ATA, and IEEE 1394
> >> > + *       c) ISA bus
> >> > + *
> >> > + *     Examples of features required to support such type of legacy guests
> >> > + *     are the need for APM and a PNP BIOS.
> >>
> >> Seriously?
> >>
> >> I think you just defined every standard native x86 system
> >> as well as QEMU/KVM as "legacy".
> >
> > I was a afraid it was too broad, but its why I used "loosely".
> >
> >> Can we just enumerate this crap?  I propose:
> >>
> >> Xen PV and lguest are paravirt_legacy.  Nothing else is
> >> paravirt_legacy.
> >
> > Fine by me!
> >
> >> The addition of new paravirt_legacy support is
> >> strongly discouraged.
> >
> > Fine by me, but Boris is re-using that for HVMLite on his recently
> > proposed patches. Do we want that ? I'll also note that the goal
> > is to ensure its hardware_subarch will be 0 (PC), meanwhile old
> > PV will be Xen (although this hasn't been set yet). This mess is
> > part of the reason why I do think stronger semantics and clearer
> > definitions will help here. Without strong semantics I can't help
> > address the dead code concerns I've been rambling over.
> >
> > HVMLite is just a rebranding for PVH done right, and then it seems PVH will be
> > dropped and we'll have HVMLite rebranded back to PVH I guess it seems.
> 
> PVH/HVMlite had better not be "paravirt" in this sense, I hope.

Cc'ing David Vrabel so he's aware of this ongoing discussion, which
relates to the HVMLite series.

As far as I follow both this patch series discussion and the HVMLite patch
series by Boris O, it doesn't sound like we yet have a solution for this given
that as-is the patch series for HVMLite still uses paravirt_enabled = 1.

Perhaps the discussion ongoing on this thread will lead us to a safe way to
distinguish.

> > The enumeration of legacy crap by ACPI boot flags seems to provide enough
> > details to suit our needs if we really wanted to zero down on the specifics of
> > what paravirt_legacy() means, there are these flags:
> >
> > /* Masks for FADT IA-PC Boot Architecture Flags (boot_flags) [Vx]=Introduced in this FADT revision */
> > #define ACPI_FADT_LEGACY_DEVICES    (1)         /* 00: [V2] System has LPC or ISA bus devices */
> > #define ACPI_FADT_8042              (1<<1)      /* 01: [V3] System has an 8042 controller on port 60/64 */
> > #define ACPI_FADT_NO_VGA            (1<<2)      /* 02: [V4] It is not safe to probe for VGA hardware */
> > #define ACPI_FADT_NO_MSI            (1<<3)      /* 03: [V4] Message Signaled Interrupts (MSI) must not be enabled */
> > #define ACPI_FADT_NO_ASPM           (1<<4)      /* 04: [V4] PCIe ASPM control must not be enabled */
> > #define ACPI_FADT_NO_CMOS_RTC       (1<<5)      /* 05: [V5] No CMOS real-time clock present */
> >
> > I checked and I didn't see qemu using any of the ACPI boot flags,
> > but I suspected qemu instances must use a series of legacy crap.
> > Likewise for KVM.
> 
> So shouldn't Linux find out things from the FADT by reading the FADT
> rather than squishing them into a confusing single function?

Boris O replied saying it seems some Xen PV guests can exist without ACPI,
but I was in the same hope as you were as it would seem the architecturally
sane way to do this. PV legacy should ultimately die, so I'm not too
concerned over it, I think we just need a clean way to deprecate it without
causing regressions, but also enabling the best architecture forward.

> Anyway, this is all ridiculous.  I propose that rather than trying to
> clean up paravirt_enabled, you just delete it.

Yay.

> Here are its users:
> 
> static inline bool is_hypervisor_range(int idx)
> {
>     /*
>      * ffff800000000000 - ffff87ffffffffff is reserved for
>      * the hypervisor.
>      */
>     return paravirt_enabled() &&
>         (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
>         (idx < pgd_index(__PAGE_OFFSET));
> }
> 
> Nope, wrong.  I don't really know what this code is trying to do, but
> I'm pretty sure it's wrong.  Did this mean to check "is Xen PV"?  Or
> was it "is Xen PV or lgeust"?  Or what?

It seems Boris and others have replied to this and a solution to replace
it is in sight without paravirt_enabled().

>         if (apm_info.bios.version == 0 || paravirt_enabled() ||
> machine_is_olpc()) {
>                 printk(KERN_INFO "apm: BIOS not found.\n");
>                 return -ENODEV;
>         }
> 
> I assume that is trying to avoid checking for APM on systems that are
> known to be too new.  How about cleanup up the condition to check
> something sensible?

With my patch series I am pretty sure I can remove this check by
relying on the subarch.

>         if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
>                 static int f00f_workaround_enabled;
>         [...]
>
> This is asking "are we the natively booted kernel?".  This has nothing
> to do with paravirt in particular.  How about just deleting that code?
>  It seems pointless.  Sure, it's the responsibility of the real root
> kernel, but nothing will break if a guest kernel also does the fixup.

Likewise. This is avoiding the foof workaround on PV guests as the
hypervisor would have done that itself. Also I was told by Konrad
at the last Xen Developer summit that it seems like we might even
be able to remove the foof workaround from upstream, but this was
just speculation. We could not confirm it.

If we need it I think I can address this with my work.

> int __init microcode_init(void)
> {
>         [...]
>         if (paravirt_enabled() || dis_ucode_ldr)
>                 return -EINVAL;
> 
>
> This is also asking "are we the natively booted kernel?"  This is
> plausibly useful for real.  (Borislav, is this actually necessary?)
> Seems to me there should be a function is_native_root_kernel() or
> similar.  Obviously it could have false positives and code will have
> to deal with that.  (This also could be entirely wrong.  What code is
> responsible for CPU microcode updates on Xen?  For all I know, dom0 is
> *supposed* to apply microcode updates, in which case that check really
> should be deleted.

Likewise. I can remove it but it sounds like others can remove it
through other means as replied to this thread.

> void __init reserve_ebda_region(void)
> {
>          [...]
>         if (paravirt_enabled())
>                 return;
> 
> I don't know what the point of this one is.
> 
> pnpbios turns itself off if paravirt_enabled().  I'm not convinced
> that's correct.

Either way my set of patches address this by relying on the subarch.

>         /* only a natively booted kernel should be using TXT */
>         if (paravirt_enabled()) {
>                 pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
>                 return;
>         }
> 
> Er, what's wrong with trying to talk to tboot on paravirt?  It won't
> be there unless something is rather wrong.  In any case, this could
> use is_native_root_kernel().

If really needed likewise, I think the subarch can be used...

> 
>         if (paravirt_enabled() && !paravirt_has(RTC))
>                 return -ENODEV;
> 
> This actually seems legit.  But how about reversing it: if
> paravirt_has(NO_RTC) return -ENODEV?  Problem solved.

I actually want to remove paravirt_has(NO_RTC), it turns out
that only lguest needs to avoid this, then we not only remove
paravirt_has(NO_RTC) but the RTF flag and the features paravirt
crap. I am pretty sure I can remove this by just relying on
the subarch, LGUEST would just not opt-in for this.

> paravirt_enabled is also used in entry_32.S:
> 
> cmpl    $0, pv_info+PARAVIRT_enabled
> 
> This is actually trying to check whether pv_cpu_ops.iret ==
> native_iret.  I sincerely hope that no additional support is *ever*
> added to x86 Linux for systems on which this is not the case.

Not sure yet as this is early code and that's where I'm having issues,
and its why I've been asking for a simple x86 hypervisr type and seeing
if we can just access it in really early code.

> So yeah, I think the right solution is to delete paravirt_enabled.

Lets do it. Given this is all work we all seem to agree I can just
drop this rename for now, and the other 2 patches can be merged
for now (or ignored, let me know, and then we can just drop things 
as we go forward with either the subarch for Xen being set and used
with the linker table stuff, and/or with the other alternatives that
are being discussed so far on this thread by others.

As noted though, we still need the HVMLite paravirt_enabled() thing
sorted out, if we drop it, will HVMLite have a way to ensure it
also gets a bail on the same routines?

  Luis
Luis Chamberlain Feb. 9, 2016, 6:59 a.m. UTC | #17
On Mon, Feb 08, 2016 at 04:46:08PM +0100, Borislav Petkov wrote:
> On Mon, Feb 08, 2016 at 10:31:36AM -0500, Boris Ostrovsky wrote:
> > This range is reserved for hypervisors but the only hypervisor that uses it
> > is Xen PV (lguest doesn't run in 64-bit mode).
> 
> Yeah, this is mentioned in arch/x86/include/asm/page_64_types.h:
> 
> /*
>  * Set __PAGE_OFFSET to the most negative possible address +
>  * PGDIR_SIZE*16 (pgd slot 272).  The gap is to allow a space for a
>  * hypervisor to fit.  Choosing 16 slots here is arbitrary, but it's
>  * what Xen requires.
>  */
> #define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
> 
> It is also in Documentation/x86/x86_64/mm.txt:
> 
> ffff800000000000 - ffff87ffffffffff (=43 bits) guard hole, reserved for hypervisor
> 
> albeit a bit laconic.

It doesn't sound like we have a replacement strategy for this yet?
If not, for PV we can rely on the subarch, but only once its set,
I can take this on as part of my work if agreeable given I'm adding
the use of the Xen subarch. If we want this in before linker table
stuff gets fleshed out this I can just split out setting the subarch
on xen_start_kernel() and replacing paravirt_enabled() where
applicable. But -- what about HVMLite for this, given as-is currently
HVMLite is to use PC subarch?

  Luis
Luis Chamberlain Feb. 9, 2016, 7:06 a.m. UTC | #18
On Mon, Feb 08, 2016 at 10:31:36AM -0500, Boris Ostrovsky wrote:
> 
> 
> On 02/06/2016 03:05 PM, Andy Lutomirski wrote:
> >
> >Anyway, this is all ridiculous.  I propose that rather than trying to
> >clean up paravirt_enabled, you just delete it.  Here are its users:
> >
> >static inline bool is_hypervisor_range(int idx)
> >{
> >     /*
> >      * ffff800000000000 - ffff87ffffffffff is reserved for
> >      * the hypervisor.
> >      */
> >     return paravirt_enabled() &&
> >         (idx >= pgd_index(__PAGE_OFFSET) - 16) &&
> >         (idx < pgd_index(__PAGE_OFFSET));
> >}
> >
> >Nope, wrong.  I don't really know what this code is trying to do, but
> >I'm pretty sure it's wrong.  Did this mean to check "is Xen PV"?  Or
> >was it "is Xen PV or lgeust"?  Or what?
> 
> This range is reserved for hypervisors but the only hypervisor that
> uses it is Xen PV (lguest doesn't run in 64-bit mode).
> 
> The check is intended to catch mappings on baremetal kernels that
> shouldn't be there. In other words it's not Xen PV that needs it,
> it's baremetal that wants to catch errors.

OK this is being discussed.

> 
> 
> >
> >         if (apm_info.bios.version == 0 || paravirt_enabled() ||
> >machine_is_olpc()) {
> >                 printk(KERN_INFO "apm: BIOS not found.\n");
> >                 return -ENODEV;
> >         }
> >
> >I assume that is trying to avoid checking for APM on systems that are
> >known to be too new.  How about cleanup up the condition to check
> >something sensible?
> 
> The check here I suspect is unnecessary since version is taken from
> boot_params and thus should be zero for Xen PV (don't know about
> lguest but if it's not then we could just set it there).

Great, but we won't know unless we can test both. Until then we need the check,
but I'm noting a solution is to use the subarch and my series can address this.

> >
> >         if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
> >                 static int f00f_workaround_enabled;
> >         [...]
> >
> >This is asking "are we the natively booted kernel?".  This has nothing
> >to do with paravirt in particular.  How about just deleting that code?
> >  It seems pointless.  Sure, it's the responsibility of the real root
> >kernel, but nothing will break if a guest kernel also does the fixup.
> 
> Yes, I also think so.

Think is very different than knowing, so without a test we won't know.
Until we test it, we could safely replace this check with the subarch
stuff. Both lguest and PV Xen would opt-out as they do now.

> >int __init microcode_init(void)
> >{
> >         [...]
> >         if (paravirt_enabled() || dis_ucode_ldr)
> >                 return -EINVAL;
> >
> >This is also asking "are we the natively booted kernel?"  This is
> >plausibly useful for real.  (Borislav, is this actually necessary?)
> >Seems to me there should be a function is_native_root_kernel() or
> >similar.  Obviously it could have false positives and code will have
> >to deal with that.  (This also could be entirely wrong.  What code is
> >responsible for CPU microcode updates on Xen?  For all I know, dom0 is
> >*supposed* to apply microcode updates, in which case that check really
> >should be deleted.
> >
> >void __init reserve_ebda_region(void)
> >{
> >          [...]
> >         if (paravirt_enabled())
> >                 return;
> >
> >I don't know what the point of this one is.
> 
> Not sure about this one neither.

This is about the BIOS Etra BDA space that the is reserved, we reserve
this on bare metal to give the OS access to this, but on a PV guest
this is not emulated AFAICT -- we skip reserving this space as its
not needed to do so for PV guests. This is what was explained
to me when I asked about this during the session I put together
over dead code concerns and I brought this one up. Konrad and
Andrew might be able to chime in more.

Anyway, with the subarch stuff I'm doing we can keep it and peg
the check with the subarch.

> >
> >pnpbios turns itself off if paravirt_enabled().  I'm not convinced
> >that's correct.
> >
> >         /* only a natively booted kernel should be using TXT */
> >         if (paravirt_enabled()) {
> >                 pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
> >                 return;
> >         }
> >
> >Er, what's wrong with trying to talk to tboot on paravirt?  It won't
> >be there unless something is rather wrong.  In any case, this could
> >use is_native_root_kernel().
> 
> As in apm_info case, I don't think this check is necessary.

Again, *think* is different than knowing. If we want to keep it just
opting out of it for Xen and lguest subarch should suffice here.

> >
> >         if (paravirt_enabled() && !paravirt_has(RTC))
> >                 return -ENODEV;
> >
> >This actually seems legit.  But how about reversing it: if
> >paravirt_has(NO_RTC) return -ENODEV?  Problem solved.
> >
> >paravirt_enabled is also used in entry_32.S:
> >
> >cmpl    $0, pv_info+PARAVIRT_enabled
> >
> >This is actually trying to check whether pv_cpu_ops.iret ==
> >native_iret.  I sincerely hope that no additional support is *ever*
> >added to x86 Linux for systems on which this is not the case.
> 
> I think we can use ALTERNATIVE(... X86_FEATURE_XENPV) here.

Seems Andy is on this.

  Luis
Luis Chamberlain Feb. 17, 2016, 8:07 p.m. UTC | #19
OK so here's a wiki to keep track of progress of the difference uses:

http://kernelnewbies.org/KernelProjects/remove-paravirt-enabled

It seems we have a resolution one way or another for all except for
the use on arch/x86/mm/dump_pagetables.c, is that right? So to be
clear we should not merge more users of paravirt_enabled.

 Luis
diff mbox

Patch

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 4c5228352744..6a248022549c 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -395,7 +395,7 @@  ldt_ss:
 	 * is still available to implement the setting of the high
 	 * 16-bits in the INTERRUPT_RETURN paravirt-op.
 	 */
-	cmpl	$0, pv_info+PARAVIRT_enabled
+	cmpl	$0, pv_info+PARAVIRT_legacy
 	jne	restore_nocheck
 #endif
 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 6542aa99714b..b3885c1f2156 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -14,14 +14,14 @@ 
 #include <linux/types.h>
 #include <linux/cpumask.h>
 
-static inline bool paravirt_enabled(void)
+static inline bool paravirt_legacy(void)
 {
-	return pv_info.paravirt_enabled;
+	return pv_info.supports_x86_legacy;
 }
 
 static inline bool paravirt_has_feature(unsigned int feature)
 {
-	WARN_ON_ONCE(!paravirt_enabled());
+	WARN_ON_ONCE(!paravirt_legacy());
 	return !!(pv_info.features & feature);
 }
 
diff --git a/arch/x86/include/asm/paravirt_types.h b/arch/x86/include/asm/paravirt_types.h
index de2382b023f2..b4094a57435d 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -61,6 +61,37 @@  struct paravirt_callee_save {
 };
 
 /* general info */
+
+/**
+ * struct pv_info - paravirt hypervisor information
+ *
+ * @supports_x86_legacy: true if this hypervisor supports legacy x86
+ * 	paravirtualized guests.  The definition of legacy here adheres
+ * 	*loosely* to both the notion of legacy in the ACPI 5.2.9.3 "IA-PC Boot
+ * 	Architecture Flags" section and the PC 2001 "legacy free" concept [1]
+ * 	referred to in the PC System Design Guide [2] [3] on Chapter 3, Page 50
+ * 	[4].  Legacy x86 guests systems are guest systems which are not "legacy
+ * 	free" as per the PC 2001 definition, and in the ACPI sense could have
+ * 	any of the legacy ACPI IA-PC Boot architecture flags set. These are x86
+ * 	systems with any type of legacy peripherals or requirements.
+ *
+ *	Examples of some popular legacy peripherals:
+ *
+ *	  a) Floppy drive
+ *	  b) Legacy ports [1] such as such as parallel ports, PS/2 connectors,
+ *	     serial ports / RS-232, game ports Parallel ATA, and IEEE 1394
+ *	  c) ISA bus
+ *
+ *	Examples of features required to support such type of legacy guests
+ *	are the need for APM and a PNP BIOS.
+ *
+ * 	[0] http://www.acpi.info/DOWNLOADS/ACPIspec50.pdf
+ *	[1] https://en.wikipedia.org/wiki/Legacy-free_PC
+ *	[2] https://en.wikipedia.org/wiki/PC_System_Design_Guide
+ *	[3] http://tech-insider.org/windows/research/2000/1102.html
+ *	[4] http://tech-insider.org/windows/research/acrobat/001102/03sys-2001.pdf
+ *	[5] https://en.wikipedia.org/wiki/Legacy_port
+ */
 struct pv_info {
 	unsigned int kernel_rpl;
 	int shared_kernel_pmd;
@@ -69,8 +100,8 @@  struct pv_info {
 	u16 extra_user_64bit_cs;  /* __USER_CS if none */
 #endif
 
-	bool paravirt_enabled;
-	unsigned int features;	  /* valid only if paravirt_enabled is set */
+	bool supports_x86_legacy;
+	unsigned int features;	  /* valid only on pv x86 legacy systems */
 	const char *name;
 };
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 5a8e7a61d5be..6052a038b27f 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -470,7 +470,7 @@  static inline unsigned long current_top_of_stack(void)
 #include <asm/paravirt.h>
 #else
 #define __cpuid			native_cpuid
-#define paravirt_enabled()	false
+#define paravirt_legacy()	false
 #define paravirt_has(x) 	false
 
 static inline void load_sp0(struct tss_struct *tss,
diff --git a/arch/x86/kernel/apm_32.c b/arch/x86/kernel/apm_32.c
index 052c9c3026cc..74a3e8ca2c6d 100644
--- a/arch/x86/kernel/apm_32.c
+++ b/arch/x86/kernel/apm_32.c
@@ -2267,7 +2267,7 @@  static int __init apm_init(void)
 
 	dmi_check_system(apm_dmi_table);
 
-	if (apm_info.bios.version == 0 || paravirt_enabled() || machine_is_olpc()) {
+	if (apm_info.bios.version == 0 || paravirt_legacy() || machine_is_olpc()) {
 		printk(KERN_INFO "apm: BIOS not found.\n");
 		return -ENODEV;
 	}
diff --git a/arch/x86/kernel/asm-offsets.c b/arch/x86/kernel/asm-offsets.c
index 84a7524b202c..f49b9cbf0561 100644
--- a/arch/x86/kernel/asm-offsets.c
+++ b/arch/x86/kernel/asm-offsets.c
@@ -59,7 +59,7 @@  void common(void) {
 
 #ifdef CONFIG_PARAVIRT
 	BLANK();
-	OFFSET(PARAVIRT_enabled, pv_info, paravirt_enabled);
+	OFFSET(PARAVIRT_legacy, pv_info, supports_x86_legacy);
 	OFFSET(PARAVIRT_PATCH_pv_cpu_ops, paravirt_patch_template, pv_cpu_ops);
 	OFFSET(PARAVIRT_PATCH_pv_irq_ops, paravirt_patch_template, pv_irq_ops);
 	OFFSET(PV_IRQ_irq_disable, pv_irq_ops, irq_disable);
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 6040bd71679f..aab7d031f612 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -220,7 +220,7 @@  static void intel_workarounds(struct cpuinfo_x86 *c)
 	 * The Quark is also family 5, but does not have the same bug.
 	 */
 	clear_cpu_bug(c, X86_BUG_F00F);
-	if (!paravirt_enabled() && c->x86 == 5 && c->x86_model < 9) {
+	if (!paravirt_legacy() && c->x86 == 5 && c->x86_model < 9) {
 		static int f00f_workaround_enabled;
 
 		set_cpu_bug(c, X86_BUG_F00F);
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/microcode/core.c
index faec7120c508..5137e06028b3 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -630,7 +630,7 @@  int __init microcode_init(void)
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 	int error;
 
-	if (paravirt_enabled() || dis_ucode_ldr)
+	if (paravirt_legacy() || dis_ucode_ldr)
 		return -EINVAL;
 
 	if (c->x86_vendor == X86_VENDOR_INTEL)
diff --git a/arch/x86/kernel/head.c b/arch/x86/kernel/head.c
index 992f442ca155..279fad7288f8 100644
--- a/arch/x86/kernel/head.c
+++ b/arch/x86/kernel/head.c
@@ -38,7 +38,7 @@  void __init reserve_ebda_region(void)
 	 * that the paravirt case can handle memory setup
 	 * correctly, without our help.
 	 */
-	if (paravirt_enabled())
+	if (paravirt_legacy())
 		return;
 
 	/* end of low (conventional) memory */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 5c717b247e1b..177773e75b4d 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -283,14 +283,7 @@  NOKPROBE_SYMBOL(do_async_page_fault);
 static void __init paravirt_ops_setup(void)
 {
 	pv_info.name = "KVM";
-
-	/*
-	 * KVM isn't paravirt in the sense of paravirt_enabled.  A KVM
-	 * guest kernel works like a bare metal kernel with additional
-	 * features, and paravirt_enabled is about features that are
-	 * missing.
-	 */
-	pv_info.paravirt_enabled = false;
+	pv_info.supports_x86_legacy = false;
 
 	if (kvm_para_has_feature(KVM_FEATURE_NOP_IO_DELAY))
 		pv_cpu_ops.io_delay = kvm_io_delay;
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 6b1f205a6ac7..a2ccb328c287 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -294,7 +294,7 @@  enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
 
 struct pv_info pv_info = {
 	.name = "bare hardware",
-	.paravirt_enabled = false,
+	.supports_x86_legacy = false,
 	.kernel_rpl = 0,
 	.shared_kernel_pmd = 1,	/* Only used when CONFIG_X86_PAE is set */
 
diff --git a/arch/x86/kernel/rtc.c b/arch/x86/kernel/rtc.c
index 4af8d063fb36..1d20e3347906 100644
--- a/arch/x86/kernel/rtc.c
+++ b/arch/x86/kernel/rtc.c
@@ -200,7 +200,7 @@  static __init int add_rtc_cmos(void)
 	}
 #endif
 
-	if (paravirt_enabled() && !paravirt_has(RTC))
+	if (paravirt_legacy() && !paravirt_has(RTC))
 		return -ENODEV;
 
 	platform_device_register(&rtc_device);
diff --git a/arch/x86/kernel/tboot.c b/arch/x86/kernel/tboot.c
index 91a4496db434..87a5027ff242 100644
--- a/arch/x86/kernel/tboot.c
+++ b/arch/x86/kernel/tboot.c
@@ -75,7 +75,7 @@  void __init tboot_probe(void)
 	}
 
 	/* only a natively booted kernel should be using TXT */
-	if (paravirt_enabled()) {
+	if (paravirt_legacy()) {
 		pr_warning("non-0 tboot_addr but pv_ops is enabled\n");
 		return;
 	}
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index c6f302f6dedb..dc33156e4d53 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -1408,8 +1408,8 @@  __init void lguest_init(void)
 {
 	/* We're under lguest. */
 	pv_info.name = "lguest";
-	/* Paravirt is enabled. */
-	pv_info.paravirt_enabled = true;
+	/* Hypervisor supports x86 legacy guest systems . */
+	pv_info.supports_x86_legacy = true;
 	/* We're running at privilege level 1, not 0 as normal. */
 	pv_info.kernel_rpl = 1;
 	/* Everyone except Xen runs with this set. */
diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 4a6f1d9b5106..e9cc69c5ad54 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -365,7 +365,7 @@  static inline bool is_hypervisor_range(int idx)
 	 * ffff800000000000 - ffff87ffffffffff is reserved for
 	 * the hypervisor.
 	 */
-	return paravirt_enabled() &&
+	return paravirt_legacy() &&
 		(idx >= pgd_index(__PAGE_OFFSET) - 16) &&
 		(idx < pgd_index(__PAGE_OFFSET));
 }
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index e303e0043881..2485a1613c2f 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -1186,7 +1186,7 @@  static unsigned xen_patch(u8 type, u16 clobbers, void *insnbuf,
 }
 
 static const struct pv_info xen_info __initconst = {
-	.paravirt_enabled = true,
+	.supports_x86_legacy = true,
 	.shared_kernel_pmd = 0,
 
 #ifdef CONFIG_X86_64
diff --git a/drivers/pnp/pnpbios/core.c b/drivers/pnp/pnpbios/core.c
index facd43b8516c..40557bc16005 100644
--- a/drivers/pnp/pnpbios/core.c
+++ b/drivers/pnp/pnpbios/core.c
@@ -521,7 +521,7 @@  static int __init pnpbios_init(void)
 	int ret;
 
 	if (pnpbios_disabled || dmi_check_system(pnpbios_dmi_table) ||
-	    paravirt_enabled()) {
+	    paravirt_legacy()) {
 		printk(KERN_INFO "PnPBIOS: Disabled\n");
 		return -ENODEV;
 	}