diff mbox

[RFC] xen/pvh: detect PVH after kexec

Message ID 20170320182042.6103-1-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vitaly Kuznetsov March 20, 2017, 6:20 p.m. UTC
PVH guests after kexec boot like normal HVM guests and we're not entering
xen_prepare_pvh() but we still want to know that we're PVH. This hack does
the job by using XEN_IOPORT_MAGIC but I didn't find any straitforward way
to do it. Did I miss something? Or shall we introduce a CPUID leaf or
something like that?

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
 arch/x86/xen/enlighten.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Boris Ostrovsky March 20, 2017, 8:21 p.m. UTC | #1
On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
> PVH guests after kexec boot like normal HVM guests and we're not entering
> xen_prepare_pvh()

Is it not? Aren't we going via xen_hvm_shutdown() and then
SHUTDOWN_soft_reset which would restart at the same entry point as
regular boot?

-boris


>  but we still want to know that we're PVH. This hack does
> the job by using XEN_IOPORT_MAGIC but I didn't find any straitforward way
> to do it. Did I miss something? Or shall we introduce a CPUID leaf or
> something like that?
>
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
>  arch/x86/xen/enlighten.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
> index ec1d5c4..4a30886 100644
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -51,6 +51,7 @@
>  #include <xen/hvm.h>
>  #include <xen/hvc-console.h>
>  #include <xen/acpi.h>
> +#include <xen/platform_pci.h>
>  
>  #include <asm/paravirt.h>
>  #include <asm/apic.h>
> @@ -1765,6 +1766,20 @@ void __init xen_prepare_pvh(void)
>  
>  	x86_init.oem.arch_setup = xen_pvh_arch_setup;
>  }
> +
> +static void xen_detect_pvh(void)
> +{
> +	short magic;
> +
> +	if (xen_pvh)
> +		return;
> +
> +	magic = inw(XEN_IOPORT_MAGIC);
> +	if (magic != XEN_IOPORT_MAGIC_VAL) {
> +		xen_pvh = 1;
> +		xen_pvh_arch_setup();
> +	}
> +}
>  #endif
>  
>  void __ref xen_hvm_init_shared_info(void)
> @@ -1912,6 +1927,9 @@ static void __init xen_hvm_guest_init(void)
>  
>  	init_hvm_pv_info();
>  
> +	/* Detect PVH booting after kexec */
> +	xen_detect_pvh();
> +
>  	xen_hvm_init_shared_info();
>  
>  	xen_panic_handler_init();
Vitaly Kuznetsov March 21, 2017, 9:21 a.m. UTC | #2
Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:

> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>> PVH guests after kexec boot like normal HVM guests and we're not entering
>> xen_prepare_pvh()
>
> Is it not? Aren't we going via xen_hvm_shutdown() and then
> SHUTDOWN_soft_reset which would restart at the same entry point as
> regular boot?

No, we're not doing regular boot: from outside of the guest we don't
really know where the new kernel is placed (as guest does it on its
own). We do soft reset to clean things up and then guest jumps to the
new kernel starting point by itself.

We could (in theory, didn't try) make it jump to the PVH starting point
but we'll have to at least prepare the right boot params for
init_pvh_bootparams and this looks like additional
complication. PVHVM-style startup suits us well but we still need to be
PVH-aware.

[snip].
Roger Pau Monne March 21, 2017, 10:01 a.m. UTC | #3
On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
> 
> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
> >> PVH guests after kexec boot like normal HVM guests and we're not entering
> >> xen_prepare_pvh()
> >
> > Is it not? Aren't we going via xen_hvm_shutdown() and then
> > SHUTDOWN_soft_reset which would restart at the same entry point as
> > regular boot?
> 
> No, we're not doing regular boot: from outside of the guest we don't
> really know where the new kernel is placed (as guest does it on its
> own). We do soft reset to clean things up and then guest jumps to the
> new kernel starting point by itself.
> 
> We could (in theory, didn't try) make it jump to the PVH starting point
> but we'll have to at least prepare the right boot params for
> init_pvh_bootparams and this looks like additional
> complication. PVHVM-style startup suits us well but we still need to be
> PVH-aware.

We are going to have the same issue when booting PVH with OVMF, Linux will be
started at the native UEFI entry point, and we will need some way to detect
that we are running in PVH mode.

What issues do you see when using the HVM boot path for kexec?

Roger.
Roger Pau Monne March 21, 2017, 10:07 a.m. UTC | #4
On Tue, Mar 21, 2017 at 10:01:15AM +0000, Roger Pau Monne wrote:
> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
> > Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
> > 
> > > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
> > >> PVH guests after kexec boot like normal HVM guests and we're not entering
> > >> xen_prepare_pvh()
> > >
> > > Is it not? Aren't we going via xen_hvm_shutdown() and then
> > > SHUTDOWN_soft_reset which would restart at the same entry point as
> > > regular boot?
> > 
> > No, we're not doing regular boot: from outside of the guest we don't
> > really know where the new kernel is placed (as guest does it on its
> > own). We do soft reset to clean things up and then guest jumps to the
> > new kernel starting point by itself.
> > 
> > We could (in theory, didn't try) make it jump to the PVH starting point
> > but we'll have to at least prepare the right boot params for
> > init_pvh_bootparams and this looks like additional
> > complication. PVHVM-style startup suits us well but we still need to be
> > PVH-aware.
> 
> We are going to have the same issue when booting PVH with OVMF, Linux will be
> started at the native UEFI entry point, and we will need some way to detect
> that we are running in PVH mode.
> 
> What issues do you see when using the HVM boot path for kexec?

FWIW, I'm wondering what would it take to unify the HVM/PVH paths inside of
Linux. The PVH entry point is still needed in case Linux is booted without any
firmware, but that should just setup the page-tables and jump into the native
entry point, where it should join with the HVM code path if possible.

Roger.
Jan Beulich March 21, 2017, 10:07 a.m. UTC | #5
>>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote:
> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>> 
>> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>> >> PVH guests after kexec boot like normal HVM guests and we're not entering
>> >> xen_prepare_pvh()
>> >
>> > Is it not? Aren't we going via xen_hvm_shutdown() and then
>> > SHUTDOWN_soft_reset which would restart at the same entry point as
>> > regular boot?
>> 
>> No, we're not doing regular boot: from outside of the guest we don't
>> really know where the new kernel is placed (as guest does it on its
>> own). We do soft reset to clean things up and then guest jumps to the
>> new kernel starting point by itself.
>> 
>> We could (in theory, didn't try) make it jump to the PVH starting point
>> but we'll have to at least prepare the right boot params for
>> init_pvh_bootparams and this looks like additional
>> complication. PVHVM-style startup suits us well but we still need to be
>> PVH-aware.
> 
> We are going to have the same issue when booting PVH with OVMF, Linux will be
> started at the native UEFI entry point, and we will need some way to detect
> that we are running in PVH mode.

I'm confused: PVH boots without any firmware, doesn't it? Hence
it shouldn't matter if there's no (legacy) BIOS or no OVMF ...

Jan
Roger Pau Monne March 21, 2017, 10:21 a.m. UTC | #6
On Tue, Mar 21, 2017 at 04:07:51AM -0600, Jan Beulich wrote:
> >>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote:
> > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
> >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
> >> 
> >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
> >> >> PVH guests after kexec boot like normal HVM guests and we're not entering
> >> >> xen_prepare_pvh()
> >> >
> >> > Is it not? Aren't we going via xen_hvm_shutdown() and then
> >> > SHUTDOWN_soft_reset which would restart at the same entry point as
> >> > regular boot?
> >> 
> >> No, we're not doing regular boot: from outside of the guest we don't
> >> really know where the new kernel is placed (as guest does it on its
> >> own). We do soft reset to clean things up and then guest jumps to the
> >> new kernel starting point by itself.
> >> 
> >> We could (in theory, didn't try) make it jump to the PVH starting point
> >> but we'll have to at least prepare the right boot params for
> >> init_pvh_bootparams and this looks like additional
> >> complication. PVHVM-style startup suits us well but we still need to be
> >> PVH-aware.
> > 
> > We are going to have the same issue when booting PVH with OVMF, Linux will be
> > started at the native UEFI entry point, and we will need some way to detect
> > that we are running in PVH mode.
> 
> I'm confused: PVH boots without any firmware, doesn't it? Hence
> it shouldn't matter if there's no (legacy) BIOS or no OVMF ...

Right now yes, we have no firmware available to PVH at all, but Anthony is
already working on porting OVMF to PVH [0].

Roger

[0] https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg00953.html
Jan Beulich March 21, 2017, 10:42 a.m. UTC | #7
>>> On 21.03.17 at 11:21, <roger.pau@citrix.com> wrote:
> On Tue, Mar 21, 2017 at 04:07:51AM -0600, Jan Beulich wrote:
>> >>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote:
>> > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>> >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>> >> 
>> >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>> >> >> PVH guests after kexec boot like normal HVM guests and we're not entering
>> >> >> xen_prepare_pvh()
>> >> >
>> >> > Is it not? Aren't we going via xen_hvm_shutdown() and then
>> >> > SHUTDOWN_soft_reset which would restart at the same entry point as
>> >> > regular boot?
>> >> 
>> >> No, we're not doing regular boot: from outside of the guest we don't
>> >> really know where the new kernel is placed (as guest does it on its
>> >> own). We do soft reset to clean things up and then guest jumps to the
>> >> new kernel starting point by itself.
>> >> 
>> >> We could (in theory, didn't try) make it jump to the PVH starting point
>> >> but we'll have to at least prepare the right boot params for
>> >> init_pvh_bootparams and this looks like additional
>> >> complication. PVHVM-style startup suits us well but we still need to be
>> >> PVH-aware.
>> > 
>> > We are going to have the same issue when booting PVH with OVMF, Linux will 
> be
>> > started at the native UEFI entry point, and we will need some way to detect
>> > that we are running in PVH mode.
>> 
>> I'm confused: PVH boots without any firmware, doesn't it? Hence
>> it shouldn't matter if there's no (legacy) BIOS or no OVMF ...
> 
> Right now yes, we have no firmware available to PVH at all, but Anthony is
> already working on porting OVMF to PVH [0].

But that leaves open the "why" aspect: What use is OVMF to a
PVH guest?

Jan
Roger Pau Monne March 21, 2017, 10:59 a.m. UTC | #8
On Tue, Mar 21, 2017 at 04:42:12AM -0600, Jan Beulich wrote:
> >>> On 21.03.17 at 11:21, <roger.pau@citrix.com> wrote:
> > On Tue, Mar 21, 2017 at 04:07:51AM -0600, Jan Beulich wrote:
> >> >>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote:
> >> > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
> >> >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
> >> >> 
> >> >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
> >> >> >> PVH guests after kexec boot like normal HVM guests and we're not entering
> >> >> >> xen_prepare_pvh()
> >> >> >
> >> >> > Is it not? Aren't we going via xen_hvm_shutdown() and then
> >> >> > SHUTDOWN_soft_reset which would restart at the same entry point as
> >> >> > regular boot?
> >> >> 
> >> >> No, we're not doing regular boot: from outside of the guest we don't
> >> >> really know where the new kernel is placed (as guest does it on its
> >> >> own). We do soft reset to clean things up and then guest jumps to the
> >> >> new kernel starting point by itself.
> >> >> 
> >> >> We could (in theory, didn't try) make it jump to the PVH starting point
> >> >> but we'll have to at least prepare the right boot params for
> >> >> init_pvh_bootparams and this looks like additional
> >> >> complication. PVHVM-style startup suits us well but we still need to be
> >> >> PVH-aware.
> >> > 
> >> > We are going to have the same issue when booting PVH with OVMF, Linux will 
> > be
> >> > started at the native UEFI entry point, and we will need some way to detect
> >> > that we are running in PVH mode.
> >> 
> >> I'm confused: PVH boots without any firmware, doesn't it? Hence
> >> it shouldn't matter if there's no (legacy) BIOS or no OVMF ...
> > 
> > Right now yes, we have no firmware available to PVH at all, but Anthony is
> > already working on porting OVMF to PVH [0].
> 
> But that leaves open the "why" aspect: What use is OVMF to a
> PVH guest?

IMHO it's better than pvgrub that Xen has been using for PV guests, and has the
bonus that OVMF can probably chainload grub, the FreeBSD loader or whatever
needed, giving us a lot of flexibility inside PVH guests.

To put a simple example, right now Xen cannot boot FreeBSD PVH guests with
modules, because the ramdisk option cannot be used by it (FreeBSD doesn't have
a ramdisk, the loader loads the needed modules at run-time). If PVH support is
added to OVMF, I should be able to chainload the FreeBSD EFI loader into it and
boot a FreeBSD guest with modules.

Roger.
Andrew Cooper March 21, 2017, 11 a.m. UTC | #9
On 21/03/17 10:42, Jan Beulich wrote:
>>>> On 21.03.17 at 11:21, <roger.pau@citrix.com> wrote:
>> On Tue, Mar 21, 2017 at 04:07:51AM -0600, Jan Beulich wrote:
>>>>>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote:
>>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>>>>>
>>>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>>>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering
>>>>>>> xen_prepare_pvh()
>>>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then
>>>>>> SHUTDOWN_soft_reset which would restart at the same entry point as
>>>>>> regular boot?
>>>>> No, we're not doing regular boot: from outside of the guest we don't
>>>>> really know where the new kernel is placed (as guest does it on its
>>>>> own). We do soft reset to clean things up and then guest jumps to the
>>>>> new kernel starting point by itself.
>>>>>
>>>>> We could (in theory, didn't try) make it jump to the PVH starting point
>>>>> but we'll have to at least prepare the right boot params for
>>>>> init_pvh_bootparams and this looks like additional
>>>>> complication. PVHVM-style startup suits us well but we still need to be
>>>>> PVH-aware.
>>>> We are going to have the same issue when booting PVH with OVMF, Linux will 
>> be
>>>> started at the native UEFI entry point, and we will need some way to detect
>>>> that we are running in PVH mode.
>>> I'm confused: PVH boots without any firmware, doesn't it? Hence
>>> it shouldn't matter if there's no (legacy) BIOS or no OVMF ...
>> Right now yes, we have no firmware available to PVH at all, but Anthony is
>> already working on porting OVMF to PVH [0].
> But that leaves open the "why" aspect: What use is OVMF to a
> PVH guest?

1) To work around the massive security attack surface of PV guests.
2) Because we think we can boot windows without Qemu in this way.

With my XenServer hat on, this is an absolute must.  I want to be
loading a single hvmloader-like-thing (pvhloader?) from dom0, which can
then chainload the guests preferred bootloader, parse filesystems and
kernels, all in guest context rather than dom0 context.

This also means that when the guest switches to a new filesystem, or
linux change their compression, no dom0 modifications are required.

~Andrew
Vitaly Kuznetsov March 21, 2017, 11:53 a.m. UTC | #10
Roger Pau Monne <roger.pau@citrix.com> writes:

> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>> 
>> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>> >> PVH guests after kexec boot like normal HVM guests and we're not entering
>> >> xen_prepare_pvh()
>> >
>> > Is it not? Aren't we going via xen_hvm_shutdown() and then
>> > SHUTDOWN_soft_reset which would restart at the same entry point as
>> > regular boot?
>> 
>> No, we're not doing regular boot: from outside of the guest we don't
>> really know where the new kernel is placed (as guest does it on its
>> own). We do soft reset to clean things up and then guest jumps to the
>> new kernel starting point by itself.
>> 
>> We could (in theory, didn't try) make it jump to the PVH starting point
>> but we'll have to at least prepare the right boot params for
>> init_pvh_bootparams and this looks like additional
>> complication. PVHVM-style startup suits us well but we still need to be
>> PVH-aware.
>
> We are going to have the same issue when booting PVH with OVMF, Linux will be
> started at the native UEFI entry point, and we will need some way to detect
> that we are running in PVH mode.
>
> What issues do you see when using the HVM boot path for kexec?

The immediate issue I ran into was ballooning driver over-allocating
with XENMEM_populate_physmap:

(XEN) Dom15 callback via changed to Direct Vector 0xf3
(XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
(XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512)
(XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
(XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512)
(XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
...

I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure
if it's related, but I see the following code in __gnttab_init():

	/* Delay grant-table initialization in the PV on HVM case */
	if (xen_hvm_domain() && !xen_pvh_domain())
		return 0;

and gnttab_init() is later called in platform_pci_probe().
Roger Pau Monne March 21, 2017, 12:13 p.m. UTC | #11
On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote:
> Roger Pau Monne <roger.pau@citrix.com> writes:
> 
> > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
> >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
> >> 
> >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
> >> >> PVH guests after kexec boot like normal HVM guests and we're not entering
> >> >> xen_prepare_pvh()
> >> >
> >> > Is it not? Aren't we going via xen_hvm_shutdown() and then
> >> > SHUTDOWN_soft_reset which would restart at the same entry point as
> >> > regular boot?
> >> 
> >> No, we're not doing regular boot: from outside of the guest we don't
> >> really know where the new kernel is placed (as guest does it on its
> >> own). We do soft reset to clean things up and then guest jumps to the
> >> new kernel starting point by itself.
> >> 
> >> We could (in theory, didn't try) make it jump to the PVH starting point
> >> but we'll have to at least prepare the right boot params for
> >> init_pvh_bootparams and this looks like additional
> >> complication. PVHVM-style startup suits us well but we still need to be
> >> PVH-aware.
> >
> > We are going to have the same issue when booting PVH with OVMF, Linux will be
> > started at the native UEFI entry point, and we will need some way to detect
> > that we are running in PVH mode.
> >
> > What issues do you see when using the HVM boot path for kexec?
> 
> The immediate issue I ran into was ballooning driver over-allocating
> with XENMEM_populate_physmap:
> 
> (XEN) Dom15 callback via changed to Direct Vector 0xf3
> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512)
> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512)
> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
> ...
> 
> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure
> if it's related, but I see the following code in __gnttab_init():
> 
> 	/* Delay grant-table initialization in the PV on HVM case */
> 	if (xen_hvm_domain() && !xen_pvh_domain())
> 		return 0;
> 
> and gnttab_init() is later called in platform_pci_probe().

But I guess this never happens in the PVH case because there's no Xen platform
PCI device?

Making the initialization of the grant tables conditional to the presence of
the Xen platform PCI device seems wrong. The only thing needed for grant tables
is a physical memory region. This can either be picked from unused physical
memory (over 4GB to avoid collisions), or by freeing some RAM region.

Roger.
Boris Ostrovsky March 21, 2017, 2:05 p.m. UTC | #12
On 03/21/2017 08:13 AM, Roger Pau Monne wrote:
> On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote:
>> Roger Pau Monne <roger.pau@citrix.com> writes:
>>
>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>>>>
>>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering
>>>>>> xen_prepare_pvh()
>>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then
>>>>> SHUTDOWN_soft_reset which would restart at the same entry point as
>>>>> regular boot?
>>>> No, we're not doing regular boot: from outside of the guest we don't
>>>> really know where the new kernel is placed (as guest does it on its
>>>> own). We do soft reset to clean things up and then guest jumps to the
>>>> new kernel starting point by itself.
>>>>
>>>> We could (in theory, didn't try) make it jump to the PVH starting point
>>>> but we'll have to at least prepare the right boot params for
>>>> init_pvh_bootparams and this looks like additional
>>>> complication. PVHVM-style startup suits us well but we still need to be
>>>> PVH-aware.
>>> We are going to have the same issue when booting PVH with OVMF, Linux will be
>>> started at the native UEFI entry point, and we will need some way to detect
>>> that we are running in PVH mode.
>>>
>>> What issues do you see when using the HVM boot path for kexec?
>> The immediate issue I ran into was ballooning driver over-allocating
>> with XENMEM_populate_physmap:


I couldn't go even that far. Is there anything besides the two libxl
patches that you posted yesterday?

>>
>> (XEN) Dom15 callback via changed to Direct Vector 0xf3
>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512)
>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512)
>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>> ...
>>
>> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure
>> if it's related, but I see the following code in __gnttab_init():
>>
>> 	/* Delay grant-table initialization in the PV on HVM case */
>> 	if (xen_hvm_domain() && !xen_pvh_domain())
>> 		return 0;
>>
>> and gnttab_init() is later called in platform_pci_probe().
> But I guess this never happens in the PVH case because there's no Xen platform
> PCI device?
>
> Making the initialization of the grant tables conditional to the presence of
> the Xen platform PCI device seems wrong. The only thing needed for grant tables
> is a physical memory region. This can either be picked from unused physical
> memory (over 4GB to avoid collisions), or by freeing some RAM region.

That's because Linux HVM guests use PCI MMIO region for grant tables
(see platform_pci_probe()).

-boris
Roger Pau Monne March 21, 2017, 2:16 p.m. UTC | #13
On Tue, Mar 21, 2017 at 10:05:27AM -0400, Boris Ostrovsky wrote:
> On 03/21/2017 08:13 AM, Roger Pau Monne wrote:
> > On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote:
> >> Roger Pau Monne <roger.pau@citrix.com> writes:
> >>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
> >> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure
> >> if it's related, but I see the following code in __gnttab_init():
> >>
> >> 	/* Delay grant-table initialization in the PV on HVM case */
> >> 	if (xen_hvm_domain() && !xen_pvh_domain())
> >> 		return 0;
> >>
> >> and gnttab_init() is later called in platform_pci_probe().
> > But I guess this never happens in the PVH case because there's no Xen platform
> > PCI device?
> >
> > Making the initialization of the grant tables conditional to the presence of
> > the Xen platform PCI device seems wrong. The only thing needed for grant tables
> > is a physical memory region. This can either be picked from unused physical
> > memory (over 4GB to avoid collisions), or by freeing some RAM region.
> 
> That's because Linux HVM guests use PCI MMIO region for grant tables
> (see platform_pci_probe()).

There's no limitation in Xen that forces HVM guests to use the PCI MMIO hole of
the Xen PCI device for the grant table. You can safely use a RAM region, or an
unused physical range, probably above 4GB for safety. I'm not sure about what
other things prevent booting a PVH guest using the same path as HVM, I guess
the ACPI SCI interrupt is also one of them.

I wonder if it would make sense to announce using CPUID the things that differ
from HVM (like the SCI over event channels), instead of simply advertising PVH.
Boris, do you have a list of differences that prevent PVH from using the HVM
code paths?

Roger.
Vitaly Kuznetsov March 21, 2017, 2:35 p.m. UTC | #14
Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:

> On 03/21/2017 08:13 AM, Roger Pau Monne wrote:
>> On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote:
>>> Roger Pau Monne <roger.pau@citrix.com> writes:
>>>
>>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>>>>>
>>>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>>>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering
>>>>>>> xen_prepare_pvh()
>>>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then
>>>>>> SHUTDOWN_soft_reset which would restart at the same entry point as
>>>>>> regular boot?
>>>>> No, we're not doing regular boot: from outside of the guest we don't
>>>>> really know where the new kernel is placed (as guest does it on its
>>>>> own). We do soft reset to clean things up and then guest jumps to the
>>>>> new kernel starting point by itself.
>>>>>
>>>>> We could (in theory, didn't try) make it jump to the PVH starting point
>>>>> but we'll have to at least prepare the right boot params for
>>>>> init_pvh_bootparams and this looks like additional
>>>>> complication. PVHVM-style startup suits us well but we still need to be
>>>>> PVH-aware.
>>>> We are going to have the same issue when booting PVH with OVMF, Linux will be
>>>> started at the native UEFI entry point, and we will need some way to detect
>>>> that we are running in PVH mode.
>>>>
>>>> What issues do you see when using the HVM boot path for kexec?
>>> The immediate issue I ran into was ballooning driver over-allocating
>>> with XENMEM_populate_physmap:
>
> I couldn't go even that far. Is there anything besides the two libxl
> patches that you posted yesterday?
>

No, the two patches should be enough. I only tested kdump so far.
Vitaly Kuznetsov March 21, 2017, 2:44 p.m. UTC | #15
Vitaly Kuznetsov <vkuznets@redhat.com> writes:

> Roger Pau Monne <roger.pau@citrix.com> writes:
>
>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>>> 
>>> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>>> >> PVH guests after kexec boot like normal HVM guests and we're not entering
>>> >> xen_prepare_pvh()
>>> >
>>> > Is it not? Aren't we going via xen_hvm_shutdown() and then
>>> > SHUTDOWN_soft_reset which would restart at the same entry point as
>>> > regular boot?
>>> 
>>> No, we're not doing regular boot: from outside of the guest we don't
>>> really know where the new kernel is placed (as guest does it on its
>>> own). We do soft reset to clean things up and then guest jumps to the
>>> new kernel starting point by itself.
>>> 
>>> We could (in theory, didn't try) make it jump to the PVH starting point
>>> but we'll have to at least prepare the right boot params for
>>> init_pvh_bootparams and this looks like additional
>>> complication. PVHVM-style startup suits us well but we still need to be
>>> PVH-aware.
>>
>> We are going to have the same issue when booting PVH with OVMF, Linux will be
>> started at the native UEFI entry point, and we will need some way to detect
>> that we are running in PVH mode.
>>
>> What issues do you see when using the HVM boot path for kexec?
>
> The immediate issue I ran into was ballooning driver over-allocating
> with XENMEM_populate_physmap:
>
> (XEN) Dom15 callback via changed to Direct Vector 0xf3
> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512)
> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512)
> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
> ...
>
> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure
> if it's related, but I see the following code in __gnttab_init():
>
> 	/* Delay grant-table initialization in the PV on HVM case */
> 	if (xen_hvm_domain() && !xen_pvh_domain())
> 		return 0;
>
> and gnttab_init() is later called in platform_pci_probe().

Two more things:

There is xen_pvh_gnttab_setup() initcall which does 

 	if (!xen_pvh_domain())
                return -ENODEV;

and this is probably the source of over-allocation.

xen_has_pv_devices() has the following:

        if (xen_pv_domain() || xen_pvh_domain())
                return true;

which also has to be patched for PVH-after-kexec. So it seems we either
have to remove all this stuff somehow or make PVH-ness detectable...
Boris Ostrovsky March 21, 2017, 3:01 p.m. UTC | #16
On 03/21/2017 10:16 AM, Roger Pau Monne wrote:
> On Tue, Mar 21, 2017 at 10:05:27AM -0400, Boris Ostrovsky wrote:
>> On 03/21/2017 08:13 AM, Roger Pau Monne wrote:
>>> On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote:
>>>> Roger Pau Monne <roger.pau@citrix.com> writes:
>>>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>>>> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure
>>>> if it's related, but I see the following code in __gnttab_init():
>>>>
>>>> 	/* Delay grant-table initialization in the PV on HVM case */
>>>> 	if (xen_hvm_domain() && !xen_pvh_domain())
>>>> 		return 0;
>>>>
>>>> and gnttab_init() is later called in platform_pci_probe().
>>> But I guess this never happens in the PVH case because there's no Xen platform
>>> PCI device?
>>>
>>> Making the initialization of the grant tables conditional to the presence of
>>> the Xen platform PCI device seems wrong. The only thing needed for grant tables
>>> is a physical memory region. This can either be picked from unused physical
>>> memory (over 4GB to avoid collisions), or by freeing some RAM region.
>> That's because Linux HVM guests use PCI MMIO region for grant tables
>> (see platform_pci_probe()).
> There's no limitation in Xen that forces HVM guests to use the PCI MMIO hole of
> the Xen PCI device for the grant table. You can safely use a RAM region, or an
> unused physical range, probably above 4GB for safety. I'm not sure about what
> other things prevent booting a PVH guest using the same path as HVM, I guess
> the ACPI SCI interrupt is also one of them.


I think (hope?) using ACPI_IRQ_MODEL_PLATFORM for PVH guests takes care
of this.


>
> I wonder if it would make sense to announce using CPUID the things that differ
> from HVM (like the SCI over event channels), instead of simply advertising PVH.
> Boris, do you have a list of differences that prevent PVH from using the HVM
> code paths?

There isn't much, really. And most of them are discoverable already. For
example, we choose acpi_irq_model based on availability of IOAPICs,
which we find out by parsing MADT.

Similarly, for the problem at hand (lack of PCI platform device) we
should be able to just search PCI space and not find it, shouldn't we?
We may need to change order of how things are done.


-boris
Boris Ostrovsky March 21, 2017, 3:14 p.m. UTC | #17
On 03/21/2017 10:44 AM, Vitaly Kuznetsov wrote:
> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>
>> Roger Pau Monne <roger.pau@citrix.com> writes:
>>
>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>>>>
>>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering
>>>>>> xen_prepare_pvh()
>>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then
>>>>> SHUTDOWN_soft_reset which would restart at the same entry point as
>>>>> regular boot?
>>>> No, we're not doing regular boot: from outside of the guest we don't
>>>> really know where the new kernel is placed (as guest does it on its
>>>> own). We do soft reset to clean things up and then guest jumps to the
>>>> new kernel starting point by itself.
>>>>
>>>> We could (in theory, didn't try) make it jump to the PVH starting point
>>>> but we'll have to at least prepare the right boot params for
>>>> init_pvh_bootparams and this looks like additional
>>>> complication. PVHVM-style startup suits us well but we still need to be
>>>> PVH-aware.
>>> We are going to have the same issue when booting PVH with OVMF, Linux will be
>>> started at the native UEFI entry point, and we will need some way to detect
>>> that we are running in PVH mode.
>>>
>>> What issues do you see when using the HVM boot path for kexec?
>> The immediate issue I ran into was ballooning driver over-allocating
>> with XENMEM_populate_physmap:
>>
>> (XEN) Dom15 callback via changed to Direct Vector 0xf3
>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512)
>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512)
>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>> ...
>>
>> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure
>> if it's related, but I see the following code in __gnttab_init():
>>
>> 	/* Delay grant-table initialization in the PV on HVM case */
>> 	if (xen_hvm_domain() && !xen_pvh_domain())
>> 		return 0;
>>
>> and gnttab_init() is later called in platform_pci_probe().
> Two more things:
>
> There is xen_pvh_gnttab_setup() initcall which does 
>
>  	if (!xen_pvh_domain())
>                 return -ENODEV;
>
> and this is probably the source of over-allocation.
>
> xen_has_pv_devices() has the following:
>
>         if (xen_pv_domain() || xen_pvh_domain())
>                 return true;
>
> which also has to be patched for PVH-after-kexec. So it seems we either
> have to remove all this stuff somehow or make PVH-ness detectable...
>


Can we read xenstore's dm-version? Or is it too early to get access there?

-boris
Vitaly Kuznetsov March 21, 2017, 5:10 p.m. UTC | #18
Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:

> On 03/21/2017 10:44 AM, Vitaly Kuznetsov wrote:
>> Vitaly Kuznetsov <vkuznets@redhat.com> writes:
>>
>>> Roger Pau Monne <roger.pau@citrix.com> writes:
>>>
>>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote:
>>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
>>>>>
>>>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote:
>>>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering
>>>>>>> xen_prepare_pvh()
>>>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then
>>>>>> SHUTDOWN_soft_reset which would restart at the same entry point as
>>>>>> regular boot?
>>>>> No, we're not doing regular boot: from outside of the guest we don't
>>>>> really know where the new kernel is placed (as guest does it on its
>>>>> own). We do soft reset to clean things up and then guest jumps to the
>>>>> new kernel starting point by itself.
>>>>>
>>>>> We could (in theory, didn't try) make it jump to the PVH starting point
>>>>> but we'll have to at least prepare the right boot params for
>>>>> init_pvh_bootparams and this looks like additional
>>>>> complication. PVHVM-style startup suits us well but we still need to be
>>>>> PVH-aware.
>>>> We are going to have the same issue when booting PVH with OVMF, Linux will be
>>>> started at the native UEFI entry point, and we will need some way to detect
>>>> that we are running in PVH mode.
>>>>
>>>> What issues do you see when using the HVM boot path for kexec?
>>> The immediate issue I ran into was ballooning driver over-allocating
>>> with XENMEM_populate_physmap:
>>>
>>> (XEN) Dom15 callback via changed to Direct Vector 0xf3
>>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>>> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512)
>>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>>> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512)
>>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400
>>> ...
>>>
>>> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure
>>> if it's related, but I see the following code in __gnttab_init():
>>>
>>> 	/* Delay grant-table initialization in the PV on HVM case */
>>> 	if (xen_hvm_domain() && !xen_pvh_domain())
>>> 		return 0;
>>>
>>> and gnttab_init() is later called in platform_pci_probe().
>> Two more things:
>>
>> There is xen_pvh_gnttab_setup() initcall which does 
>>
>>  	if (!xen_pvh_domain())
>>                 return -ENODEV;
>>
>> and this is probably the source of over-allocation.
>>
>> xen_has_pv_devices() has the following:
>>
>>         if (xen_pv_domain() || xen_pvh_domain())
>>                 return true;
>>
>> which also has to be patched for PVH-after-kexec. So it seems we either
>> have to remove all this stuff somehow or make PVH-ness detectable...
>>
>
> Can we read xenstore's dm-version? Or is it too early to get access there?
>

Seems to be too late: xenbus_init() is a postcore_initcall. Moreover, it
seems that 'dm-version' lives under 'libxl' in xenstore, not sure if
it's readable from the domain.
Roger Pau Monne March 21, 2017, 5:28 p.m. UTC | #19
On Tue, Mar 21, 2017 at 06:10:06PM +0100, Vitaly Kuznetsov wrote:
> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes:
> > Can we read xenstore's dm-version? Or is it too early to get access there?
> >
> 
> Seems to be too late: xenbus_init() is a postcore_initcall. Moreover, it
> seems that 'dm-version' lives under 'libxl' in xenstore, not sure if
> it's readable from the domain.

I don't think you can trust anything inside of "libxl/" to be stable.

Roger.
diff mbox

Patch

diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index ec1d5c4..4a30886 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -51,6 +51,7 @@ 
 #include <xen/hvm.h>
 #include <xen/hvc-console.h>
 #include <xen/acpi.h>
+#include <xen/platform_pci.h>
 
 #include <asm/paravirt.h>
 #include <asm/apic.h>
@@ -1765,6 +1766,20 @@  void __init xen_prepare_pvh(void)
 
 	x86_init.oem.arch_setup = xen_pvh_arch_setup;
 }
+
+static void xen_detect_pvh(void)
+{
+	short magic;
+
+	if (xen_pvh)
+		return;
+
+	magic = inw(XEN_IOPORT_MAGIC);
+	if (magic != XEN_IOPORT_MAGIC_VAL) {
+		xen_pvh = 1;
+		xen_pvh_arch_setup();
+	}
+}
 #endif
 
 void __ref xen_hvm_init_shared_info(void)
@@ -1912,6 +1927,9 @@  static void __init xen_hvm_guest_init(void)
 
 	init_hvm_pv_info();
 
+	/* Detect PVH booting after kexec */
+	xen_detect_pvh();
+
 	xen_hvm_init_shared_info();
 
 	xen_panic_handler_init();