Message ID | 20170320182042.6103-1-vkuznets@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: > PVH guests after kexec boot like normal HVM guests and we're not entering > xen_prepare_pvh() Is it not? Aren't we going via xen_hvm_shutdown() and then SHUTDOWN_soft_reset which would restart at the same entry point as regular boot? -boris > but we still want to know that we're PVH. This hack does > the job by using XEN_IOPORT_MAGIC but I didn't find any straitforward way > to do it. Did I miss something? Or shall we introduce a CPUID leaf or > something like that? > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> > --- > arch/x86/xen/enlighten.c | 18 ++++++++++++++++++ > 1 file changed, 18 insertions(+) > > diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c > index ec1d5c4..4a30886 100644 > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -51,6 +51,7 @@ > #include <xen/hvm.h> > #include <xen/hvc-console.h> > #include <xen/acpi.h> > +#include <xen/platform_pci.h> > > #include <asm/paravirt.h> > #include <asm/apic.h> > @@ -1765,6 +1766,20 @@ void __init xen_prepare_pvh(void) > > x86_init.oem.arch_setup = xen_pvh_arch_setup; > } > + > +static void xen_detect_pvh(void) > +{ > + short magic; > + > + if (xen_pvh) > + return; > + > + magic = inw(XEN_IOPORT_MAGIC); > + if (magic != XEN_IOPORT_MAGIC_VAL) { > + xen_pvh = 1; > + xen_pvh_arch_setup(); > + } > +} > #endif > > void __ref xen_hvm_init_shared_info(void) > @@ -1912,6 +1927,9 @@ static void __init xen_hvm_guest_init(void) > > init_hvm_pv_info(); > > + /* Detect PVH booting after kexec */ > + xen_detect_pvh(); > + > xen_hvm_init_shared_info(); > > xen_panic_handler_init();
Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >> PVH guests after kexec boot like normal HVM guests and we're not entering >> xen_prepare_pvh() > > Is it not? Aren't we going via xen_hvm_shutdown() and then > SHUTDOWN_soft_reset which would restart at the same entry point as > regular boot? No, we're not doing regular boot: from outside of the guest we don't really know where the new kernel is placed (as guest does it on its own). We do soft reset to clean things up and then guest jumps to the new kernel starting point by itself. We could (in theory, didn't try) make it jump to the PVH starting point but we'll have to at least prepare the right boot params for init_pvh_bootparams and this looks like additional complication. PVHVM-style startup suits us well but we still need to be PVH-aware. [snip].
On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: > Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > > > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: > >> PVH guests after kexec boot like normal HVM guests and we're not entering > >> xen_prepare_pvh() > > > > Is it not? Aren't we going via xen_hvm_shutdown() and then > > SHUTDOWN_soft_reset which would restart at the same entry point as > > regular boot? > > No, we're not doing regular boot: from outside of the guest we don't > really know where the new kernel is placed (as guest does it on its > own). We do soft reset to clean things up and then guest jumps to the > new kernel starting point by itself. > > We could (in theory, didn't try) make it jump to the PVH starting point > but we'll have to at least prepare the right boot params for > init_pvh_bootparams and this looks like additional > complication. PVHVM-style startup suits us well but we still need to be > PVH-aware. We are going to have the same issue when booting PVH with OVMF, Linux will be started at the native UEFI entry point, and we will need some way to detect that we are running in PVH mode. What issues do you see when using the HVM boot path for kexec? Roger.
On Tue, Mar 21, 2017 at 10:01:15AM +0000, Roger Pau Monne wrote: > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: > > Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > > > > > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: > > >> PVH guests after kexec boot like normal HVM guests and we're not entering > > >> xen_prepare_pvh() > > > > > > Is it not? Aren't we going via xen_hvm_shutdown() and then > > > SHUTDOWN_soft_reset which would restart at the same entry point as > > > regular boot? > > > > No, we're not doing regular boot: from outside of the guest we don't > > really know where the new kernel is placed (as guest does it on its > > own). We do soft reset to clean things up and then guest jumps to the > > new kernel starting point by itself. > > > > We could (in theory, didn't try) make it jump to the PVH starting point > > but we'll have to at least prepare the right boot params for > > init_pvh_bootparams and this looks like additional > > complication. PVHVM-style startup suits us well but we still need to be > > PVH-aware. > > We are going to have the same issue when booting PVH with OVMF, Linux will be > started at the native UEFI entry point, and we will need some way to detect > that we are running in PVH mode. > > What issues do you see when using the HVM boot path for kexec? FWIW, I'm wondering what would it take to unify the HVM/PVH paths inside of Linux. The PVH entry point is still needed in case Linux is booted without any firmware, but that should just setup the page-tables and jump into the native entry point, where it should join with the HVM code path if possible. Roger.
>>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote: > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >> >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >> >> PVH guests after kexec boot like normal HVM guests and we're not entering >> >> xen_prepare_pvh() >> > >> > Is it not? Aren't we going via xen_hvm_shutdown() and then >> > SHUTDOWN_soft_reset which would restart at the same entry point as >> > regular boot? >> >> No, we're not doing regular boot: from outside of the guest we don't >> really know where the new kernel is placed (as guest does it on its >> own). We do soft reset to clean things up and then guest jumps to the >> new kernel starting point by itself. >> >> We could (in theory, didn't try) make it jump to the PVH starting point >> but we'll have to at least prepare the right boot params for >> init_pvh_bootparams and this looks like additional >> complication. PVHVM-style startup suits us well but we still need to be >> PVH-aware. > > We are going to have the same issue when booting PVH with OVMF, Linux will be > started at the native UEFI entry point, and we will need some way to detect > that we are running in PVH mode. I'm confused: PVH boots without any firmware, doesn't it? Hence it shouldn't matter if there's no (legacy) BIOS or no OVMF ... Jan
On Tue, Mar 21, 2017 at 04:07:51AM -0600, Jan Beulich wrote: > >>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote: > > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: > >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > >> > >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: > >> >> PVH guests after kexec boot like normal HVM guests and we're not entering > >> >> xen_prepare_pvh() > >> > > >> > Is it not? Aren't we going via xen_hvm_shutdown() and then > >> > SHUTDOWN_soft_reset which would restart at the same entry point as > >> > regular boot? > >> > >> No, we're not doing regular boot: from outside of the guest we don't > >> really know where the new kernel is placed (as guest does it on its > >> own). We do soft reset to clean things up and then guest jumps to the > >> new kernel starting point by itself. > >> > >> We could (in theory, didn't try) make it jump to the PVH starting point > >> but we'll have to at least prepare the right boot params for > >> init_pvh_bootparams and this looks like additional > >> complication. PVHVM-style startup suits us well but we still need to be > >> PVH-aware. > > > > We are going to have the same issue when booting PVH with OVMF, Linux will be > > started at the native UEFI entry point, and we will need some way to detect > > that we are running in PVH mode. > > I'm confused: PVH boots without any firmware, doesn't it? Hence > it shouldn't matter if there's no (legacy) BIOS or no OVMF ... Right now yes, we have no firmware available to PVH at all, but Anthony is already working on porting OVMF to PVH [0]. Roger [0] https://lists.xenproject.org/archives/html/xen-devel/2016-12/msg00953.html
>>> On 21.03.17 at 11:21, <roger.pau@citrix.com> wrote: > On Tue, Mar 21, 2017 at 04:07:51AM -0600, Jan Beulich wrote: >> >>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote: >> > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >> >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >> >> >> >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >> >> >> PVH guests after kexec boot like normal HVM guests and we're not entering >> >> >> xen_prepare_pvh() >> >> > >> >> > Is it not? Aren't we going via xen_hvm_shutdown() and then >> >> > SHUTDOWN_soft_reset which would restart at the same entry point as >> >> > regular boot? >> >> >> >> No, we're not doing regular boot: from outside of the guest we don't >> >> really know where the new kernel is placed (as guest does it on its >> >> own). We do soft reset to clean things up and then guest jumps to the >> >> new kernel starting point by itself. >> >> >> >> We could (in theory, didn't try) make it jump to the PVH starting point >> >> but we'll have to at least prepare the right boot params for >> >> init_pvh_bootparams and this looks like additional >> >> complication. PVHVM-style startup suits us well but we still need to be >> >> PVH-aware. >> > >> > We are going to have the same issue when booting PVH with OVMF, Linux will > be >> > started at the native UEFI entry point, and we will need some way to detect >> > that we are running in PVH mode. >> >> I'm confused: PVH boots without any firmware, doesn't it? Hence >> it shouldn't matter if there's no (legacy) BIOS or no OVMF ... > > Right now yes, we have no firmware available to PVH at all, but Anthony is > already working on porting OVMF to PVH [0]. But that leaves open the "why" aspect: What use is OVMF to a PVH guest? Jan
On Tue, Mar 21, 2017 at 04:42:12AM -0600, Jan Beulich wrote: > >>> On 21.03.17 at 11:21, <roger.pau@citrix.com> wrote: > > On Tue, Mar 21, 2017 at 04:07:51AM -0600, Jan Beulich wrote: > >> >>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote: > >> > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: > >> >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > >> >> > >> >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: > >> >> >> PVH guests after kexec boot like normal HVM guests and we're not entering > >> >> >> xen_prepare_pvh() > >> >> > > >> >> > Is it not? Aren't we going via xen_hvm_shutdown() and then > >> >> > SHUTDOWN_soft_reset which would restart at the same entry point as > >> >> > regular boot? > >> >> > >> >> No, we're not doing regular boot: from outside of the guest we don't > >> >> really know where the new kernel is placed (as guest does it on its > >> >> own). We do soft reset to clean things up and then guest jumps to the > >> >> new kernel starting point by itself. > >> >> > >> >> We could (in theory, didn't try) make it jump to the PVH starting point > >> >> but we'll have to at least prepare the right boot params for > >> >> init_pvh_bootparams and this looks like additional > >> >> complication. PVHVM-style startup suits us well but we still need to be > >> >> PVH-aware. > >> > > >> > We are going to have the same issue when booting PVH with OVMF, Linux will > > be > >> > started at the native UEFI entry point, and we will need some way to detect > >> > that we are running in PVH mode. > >> > >> I'm confused: PVH boots without any firmware, doesn't it? Hence > >> it shouldn't matter if there's no (legacy) BIOS or no OVMF ... > > > > Right now yes, we have no firmware available to PVH at all, but Anthony is > > already working on porting OVMF to PVH [0]. > > But that leaves open the "why" aspect: What use is OVMF to a > PVH guest? IMHO it's better than pvgrub that Xen has been using for PV guests, and has the bonus that OVMF can probably chainload grub, the FreeBSD loader or whatever needed, giving us a lot of flexibility inside PVH guests. To put a simple example, right now Xen cannot boot FreeBSD PVH guests with modules, because the ramdisk option cannot be used by it (FreeBSD doesn't have a ramdisk, the loader loads the needed modules at run-time). If PVH support is added to OVMF, I should be able to chainload the FreeBSD EFI loader into it and boot a FreeBSD guest with modules. Roger.
On 21/03/17 10:42, Jan Beulich wrote: >>>> On 21.03.17 at 11:21, <roger.pau@citrix.com> wrote: >> On Tue, Mar 21, 2017 at 04:07:51AM -0600, Jan Beulich wrote: >>>>>> On 21.03.17 at 11:01, <roger.pau@citrix.com> wrote: >>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >>>>> >>>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >>>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering >>>>>>> xen_prepare_pvh() >>>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then >>>>>> SHUTDOWN_soft_reset which would restart at the same entry point as >>>>>> regular boot? >>>>> No, we're not doing regular boot: from outside of the guest we don't >>>>> really know where the new kernel is placed (as guest does it on its >>>>> own). We do soft reset to clean things up and then guest jumps to the >>>>> new kernel starting point by itself. >>>>> >>>>> We could (in theory, didn't try) make it jump to the PVH starting point >>>>> but we'll have to at least prepare the right boot params for >>>>> init_pvh_bootparams and this looks like additional >>>>> complication. PVHVM-style startup suits us well but we still need to be >>>>> PVH-aware. >>>> We are going to have the same issue when booting PVH with OVMF, Linux will >> be >>>> started at the native UEFI entry point, and we will need some way to detect >>>> that we are running in PVH mode. >>> I'm confused: PVH boots without any firmware, doesn't it? Hence >>> it shouldn't matter if there's no (legacy) BIOS or no OVMF ... >> Right now yes, we have no firmware available to PVH at all, but Anthony is >> already working on porting OVMF to PVH [0]. > But that leaves open the "why" aspect: What use is OVMF to a > PVH guest? 1) To work around the massive security attack surface of PV guests. 2) Because we think we can boot windows without Qemu in this way. With my XenServer hat on, this is an absolute must. I want to be loading a single hvmloader-like-thing (pvhloader?) from dom0, which can then chainload the guests preferred bootloader, parse filesystems and kernels, all in guest context rather than dom0 context. This also means that when the guest switches to a new filesystem, or linux change their compression, no dom0 modifications are required. ~Andrew
Roger Pau Monne <roger.pau@citrix.com> writes: > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >> >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >> >> PVH guests after kexec boot like normal HVM guests and we're not entering >> >> xen_prepare_pvh() >> > >> > Is it not? Aren't we going via xen_hvm_shutdown() and then >> > SHUTDOWN_soft_reset which would restart at the same entry point as >> > regular boot? >> >> No, we're not doing regular boot: from outside of the guest we don't >> really know where the new kernel is placed (as guest does it on its >> own). We do soft reset to clean things up and then guest jumps to the >> new kernel starting point by itself. >> >> We could (in theory, didn't try) make it jump to the PVH starting point >> but we'll have to at least prepare the right boot params for >> init_pvh_bootparams and this looks like additional >> complication. PVHVM-style startup suits us well but we still need to be >> PVH-aware. > > We are going to have the same issue when booting PVH with OVMF, Linux will be > started at the native UEFI entry point, and we will need some way to detect > that we are running in PVH mode. > > What issues do you see when using the HVM boot path for kexec? The immediate issue I ran into was ballooning driver over-allocating with XENMEM_populate_physmap: (XEN) Dom15 callback via changed to Direct Vector 0xf3 (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512) (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512) (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 ... I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure if it's related, but I see the following code in __gnttab_init(): /* Delay grant-table initialization in the PV on HVM case */ if (xen_hvm_domain() && !xen_pvh_domain()) return 0; and gnttab_init() is later called in platform_pci_probe().
On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote: > Roger Pau Monne <roger.pau@citrix.com> writes: > > > On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: > >> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > >> > >> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: > >> >> PVH guests after kexec boot like normal HVM guests and we're not entering > >> >> xen_prepare_pvh() > >> > > >> > Is it not? Aren't we going via xen_hvm_shutdown() and then > >> > SHUTDOWN_soft_reset which would restart at the same entry point as > >> > regular boot? > >> > >> No, we're not doing regular boot: from outside of the guest we don't > >> really know where the new kernel is placed (as guest does it on its > >> own). We do soft reset to clean things up and then guest jumps to the > >> new kernel starting point by itself. > >> > >> We could (in theory, didn't try) make it jump to the PVH starting point > >> but we'll have to at least prepare the right boot params for > >> init_pvh_bootparams and this looks like additional > >> complication. PVHVM-style startup suits us well but we still need to be > >> PVH-aware. > > > > We are going to have the same issue when booting PVH with OVMF, Linux will be > > started at the native UEFI entry point, and we will need some way to detect > > that we are running in PVH mode. > > > > What issues do you see when using the HVM boot path for kexec? > > The immediate issue I ran into was ballooning driver over-allocating > with XENMEM_populate_physmap: > > (XEN) Dom15 callback via changed to Direct Vector 0xf3 > (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 > (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512) > (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 > (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512) > (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 > ... > > I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure > if it's related, but I see the following code in __gnttab_init(): > > /* Delay grant-table initialization in the PV on HVM case */ > if (xen_hvm_domain() && !xen_pvh_domain()) > return 0; > > and gnttab_init() is later called in platform_pci_probe(). But I guess this never happens in the PVH case because there's no Xen platform PCI device? Making the initialization of the grant tables conditional to the presence of the Xen platform PCI device seems wrong. The only thing needed for grant tables is a physical memory region. This can either be picked from unused physical memory (over 4GB to avoid collisions), or by freeing some RAM region. Roger.
On 03/21/2017 08:13 AM, Roger Pau Monne wrote: > On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote: >> Roger Pau Monne <roger.pau@citrix.com> writes: >> >>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >>>> >>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering >>>>>> xen_prepare_pvh() >>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then >>>>> SHUTDOWN_soft_reset which would restart at the same entry point as >>>>> regular boot? >>>> No, we're not doing regular boot: from outside of the guest we don't >>>> really know where the new kernel is placed (as guest does it on its >>>> own). We do soft reset to clean things up and then guest jumps to the >>>> new kernel starting point by itself. >>>> >>>> We could (in theory, didn't try) make it jump to the PVH starting point >>>> but we'll have to at least prepare the right boot params for >>>> init_pvh_bootparams and this looks like additional >>>> complication. PVHVM-style startup suits us well but we still need to be >>>> PVH-aware. >>> We are going to have the same issue when booting PVH with OVMF, Linux will be >>> started at the native UEFI entry point, and we will need some way to detect >>> that we are running in PVH mode. >>> >>> What issues do you see when using the HVM boot path for kexec? >> The immediate issue I ran into was ballooning driver over-allocating >> with XENMEM_populate_physmap: I couldn't go even that far. Is there anything besides the two libxl patches that you posted yesterday? >> >> (XEN) Dom15 callback via changed to Direct Vector 0xf3 >> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512) >> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512) >> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >> ... >> >> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure >> if it's related, but I see the following code in __gnttab_init(): >> >> /* Delay grant-table initialization in the PV on HVM case */ >> if (xen_hvm_domain() && !xen_pvh_domain()) >> return 0; >> >> and gnttab_init() is later called in platform_pci_probe(). > But I guess this never happens in the PVH case because there's no Xen platform > PCI device? > > Making the initialization of the grant tables conditional to the presence of > the Xen platform PCI device seems wrong. The only thing needed for grant tables > is a physical memory region. This can either be picked from unused physical > memory (over 4GB to avoid collisions), or by freeing some RAM region. That's because Linux HVM guests use PCI MMIO region for grant tables (see platform_pci_probe()). -boris
On Tue, Mar 21, 2017 at 10:05:27AM -0400, Boris Ostrovsky wrote: > On 03/21/2017 08:13 AM, Roger Pau Monne wrote: > > On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote: > >> Roger Pau Monne <roger.pau@citrix.com> writes: > >>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: > >> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure > >> if it's related, but I see the following code in __gnttab_init(): > >> > >> /* Delay grant-table initialization in the PV on HVM case */ > >> if (xen_hvm_domain() && !xen_pvh_domain()) > >> return 0; > >> > >> and gnttab_init() is later called in platform_pci_probe(). > > But I guess this never happens in the PVH case because there's no Xen platform > > PCI device? > > > > Making the initialization of the grant tables conditional to the presence of > > the Xen platform PCI device seems wrong. The only thing needed for grant tables > > is a physical memory region. This can either be picked from unused physical > > memory (over 4GB to avoid collisions), or by freeing some RAM region. > > That's because Linux HVM guests use PCI MMIO region for grant tables > (see platform_pci_probe()). There's no limitation in Xen that forces HVM guests to use the PCI MMIO hole of the Xen PCI device for the grant table. You can safely use a RAM region, or an unused physical range, probably above 4GB for safety. I'm not sure about what other things prevent booting a PVH guest using the same path as HVM, I guess the ACPI SCI interrupt is also one of them. I wonder if it would make sense to announce using CPUID the things that differ from HVM (like the SCI over event channels), instead of simply advertising PVH. Boris, do you have a list of differences that prevent PVH from using the HVM code paths? Roger.
Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > On 03/21/2017 08:13 AM, Roger Pau Monne wrote: >> On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote: >>> Roger Pau Monne <roger.pau@citrix.com> writes: >>> >>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >>>>> >>>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >>>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering >>>>>>> xen_prepare_pvh() >>>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then >>>>>> SHUTDOWN_soft_reset which would restart at the same entry point as >>>>>> regular boot? >>>>> No, we're not doing regular boot: from outside of the guest we don't >>>>> really know where the new kernel is placed (as guest does it on its >>>>> own). We do soft reset to clean things up and then guest jumps to the >>>>> new kernel starting point by itself. >>>>> >>>>> We could (in theory, didn't try) make it jump to the PVH starting point >>>>> but we'll have to at least prepare the right boot params for >>>>> init_pvh_bootparams and this looks like additional >>>>> complication. PVHVM-style startup suits us well but we still need to be >>>>> PVH-aware. >>>> We are going to have the same issue when booting PVH with OVMF, Linux will be >>>> started at the native UEFI entry point, and we will need some way to detect >>>> that we are running in PVH mode. >>>> >>>> What issues do you see when using the HVM boot path for kexec? >>> The immediate issue I ran into was ballooning driver over-allocating >>> with XENMEM_populate_physmap: > > I couldn't go even that far. Is there anything besides the two libxl > patches that you posted yesterday? > No, the two patches should be enough. I only tested kdump so far.
Vitaly Kuznetsov <vkuznets@redhat.com> writes: > Roger Pau Monne <roger.pau@citrix.com> writes: > >> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >>> >>> > On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >>> >> PVH guests after kexec boot like normal HVM guests and we're not entering >>> >> xen_prepare_pvh() >>> > >>> > Is it not? Aren't we going via xen_hvm_shutdown() and then >>> > SHUTDOWN_soft_reset which would restart at the same entry point as >>> > regular boot? >>> >>> No, we're not doing regular boot: from outside of the guest we don't >>> really know where the new kernel is placed (as guest does it on its >>> own). We do soft reset to clean things up and then guest jumps to the >>> new kernel starting point by itself. >>> >>> We could (in theory, didn't try) make it jump to the PVH starting point >>> but we'll have to at least prepare the right boot params for >>> init_pvh_bootparams and this looks like additional >>> complication. PVHVM-style startup suits us well but we still need to be >>> PVH-aware. >> >> We are going to have the same issue when booting PVH with OVMF, Linux will be >> started at the native UEFI entry point, and we will need some way to detect >> that we are running in PVH mode. >> >> What issues do you see when using the HVM boot path for kexec? > > The immediate issue I ran into was ballooning driver over-allocating > with XENMEM_populate_physmap: > > (XEN) Dom15 callback via changed to Direct Vector 0xf3 > (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 > (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512) > (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 > (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512) > (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 > ... > > I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure > if it's related, but I see the following code in __gnttab_init(): > > /* Delay grant-table initialization in the PV on HVM case */ > if (xen_hvm_domain() && !xen_pvh_domain()) > return 0; > > and gnttab_init() is later called in platform_pci_probe(). Two more things: There is xen_pvh_gnttab_setup() initcall which does if (!xen_pvh_domain()) return -ENODEV; and this is probably the source of over-allocation. xen_has_pv_devices() has the following: if (xen_pv_domain() || xen_pvh_domain()) return true; which also has to be patched for PVH-after-kexec. So it seems we either have to remove all this stuff somehow or make PVH-ness detectable...
On 03/21/2017 10:16 AM, Roger Pau Monne wrote: > On Tue, Mar 21, 2017 at 10:05:27AM -0400, Boris Ostrovsky wrote: >> On 03/21/2017 08:13 AM, Roger Pau Monne wrote: >>> On Tue, Mar 21, 2017 at 12:53:07PM +0100, Vitaly Kuznetsov wrote: >>>> Roger Pau Monne <roger.pau@citrix.com> writes: >>>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >>>> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure >>>> if it's related, but I see the following code in __gnttab_init(): >>>> >>>> /* Delay grant-table initialization in the PV on HVM case */ >>>> if (xen_hvm_domain() && !xen_pvh_domain()) >>>> return 0; >>>> >>>> and gnttab_init() is later called in platform_pci_probe(). >>> But I guess this never happens in the PVH case because there's no Xen platform >>> PCI device? >>> >>> Making the initialization of the grant tables conditional to the presence of >>> the Xen platform PCI device seems wrong. The only thing needed for grant tables >>> is a physical memory region. This can either be picked from unused physical >>> memory (over 4GB to avoid collisions), or by freeing some RAM region. >> That's because Linux HVM guests use PCI MMIO region for grant tables >> (see platform_pci_probe()). > There's no limitation in Xen that forces HVM guests to use the PCI MMIO hole of > the Xen PCI device for the grant table. You can safely use a RAM region, or an > unused physical range, probably above 4GB for safety. I'm not sure about what > other things prevent booting a PVH guest using the same path as HVM, I guess > the ACPI SCI interrupt is also one of them. I think (hope?) using ACPI_IRQ_MODEL_PLATFORM for PVH guests takes care of this. > > I wonder if it would make sense to announce using CPUID the things that differ > from HVM (like the SCI over event channels), instead of simply advertising PVH. > Boris, do you have a list of differences that prevent PVH from using the HVM > code paths? There isn't much, really. And most of them are discoverable already. For example, we choose acpi_irq_model based on availability of IOAPICs, which we find out by parsing MADT. Similarly, for the problem at hand (lack of PCI platform device) we should be able to just search PCI space and not find it, shouldn't we? We may need to change order of how things are done. -boris
On 03/21/2017 10:44 AM, Vitaly Kuznetsov wrote: > Vitaly Kuznetsov <vkuznets@redhat.com> writes: > >> Roger Pau Monne <roger.pau@citrix.com> writes: >> >>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >>>> >>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering >>>>>> xen_prepare_pvh() >>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then >>>>> SHUTDOWN_soft_reset which would restart at the same entry point as >>>>> regular boot? >>>> No, we're not doing regular boot: from outside of the guest we don't >>>> really know where the new kernel is placed (as guest does it on its >>>> own). We do soft reset to clean things up and then guest jumps to the >>>> new kernel starting point by itself. >>>> >>>> We could (in theory, didn't try) make it jump to the PVH starting point >>>> but we'll have to at least prepare the right boot params for >>>> init_pvh_bootparams and this looks like additional >>>> complication. PVHVM-style startup suits us well but we still need to be >>>> PVH-aware. >>> We are going to have the same issue when booting PVH with OVMF, Linux will be >>> started at the native UEFI entry point, and we will need some way to detect >>> that we are running in PVH mode. >>> >>> What issues do you see when using the HVM boot path for kexec? >> The immediate issue I ran into was ballooning driver over-allocating >> with XENMEM_populate_physmap: >> >> (XEN) Dom15 callback via changed to Direct Vector 0xf3 >> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512) >> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512) >> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >> ... >> >> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure >> if it's related, but I see the following code in __gnttab_init(): >> >> /* Delay grant-table initialization in the PV on HVM case */ >> if (xen_hvm_domain() && !xen_pvh_domain()) >> return 0; >> >> and gnttab_init() is later called in platform_pci_probe(). > Two more things: > > There is xen_pvh_gnttab_setup() initcall which does > > if (!xen_pvh_domain()) > return -ENODEV; > > and this is probably the source of over-allocation. > > xen_has_pv_devices() has the following: > > if (xen_pv_domain() || xen_pvh_domain()) > return true; > > which also has to be patched for PVH-after-kexec. So it seems we either > have to remove all this stuff somehow or make PVH-ness detectable... > Can we read xenstore's dm-version? Or is it too early to get access there? -boris
Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > On 03/21/2017 10:44 AM, Vitaly Kuznetsov wrote: >> Vitaly Kuznetsov <vkuznets@redhat.com> writes: >> >>> Roger Pau Monne <roger.pau@citrix.com> writes: >>> >>>> On Tue, Mar 21, 2017 at 10:21:52AM +0100, Vitaly Kuznetsov wrote: >>>>> Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: >>>>> >>>>>> On 03/20/2017 02:20 PM, Vitaly Kuznetsov wrote: >>>>>>> PVH guests after kexec boot like normal HVM guests and we're not entering >>>>>>> xen_prepare_pvh() >>>>>> Is it not? Aren't we going via xen_hvm_shutdown() and then >>>>>> SHUTDOWN_soft_reset which would restart at the same entry point as >>>>>> regular boot? >>>>> No, we're not doing regular boot: from outside of the guest we don't >>>>> really know where the new kernel is placed (as guest does it on its >>>>> own). We do soft reset to clean things up and then guest jumps to the >>>>> new kernel starting point by itself. >>>>> >>>>> We could (in theory, didn't try) make it jump to the PVH starting point >>>>> but we'll have to at least prepare the right boot params for >>>>> init_pvh_bootparams and this looks like additional >>>>> complication. PVHVM-style startup suits us well but we still need to be >>>>> PVH-aware. >>>> We are going to have the same issue when booting PVH with OVMF, Linux will be >>>> started at the native UEFI entry point, and we will need some way to detect >>>> that we are running in PVH mode. >>>> >>>> What issues do you see when using the HVM boot path for kexec? >>> The immediate issue I ran into was ballooning driver over-allocating >>> with XENMEM_populate_physmap: >>> >>> (XEN) Dom15 callback via changed to Direct Vector 0xf3 >>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >>> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (175 of 512) >>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >>> (XEN) memory.c:225:d15v0 Could not allocate order=0 extent: id=15 memflags=0 (0 of 512) >>> (XEN) d15v0 Over-allocation for domain 15: 262401 > 262400 >>> ... >>> >>> I didn't investigate why it happens, setting xen_pvh=1 helped. Not sure >>> if it's related, but I see the following code in __gnttab_init(): >>> >>> /* Delay grant-table initialization in the PV on HVM case */ >>> if (xen_hvm_domain() && !xen_pvh_domain()) >>> return 0; >>> >>> and gnttab_init() is later called in platform_pci_probe(). >> Two more things: >> >> There is xen_pvh_gnttab_setup() initcall which does >> >> if (!xen_pvh_domain()) >> return -ENODEV; >> >> and this is probably the source of over-allocation. >> >> xen_has_pv_devices() has the following: >> >> if (xen_pv_domain() || xen_pvh_domain()) >> return true; >> >> which also has to be patched for PVH-after-kexec. So it seems we either >> have to remove all this stuff somehow or make PVH-ness detectable... >> > > Can we read xenstore's dm-version? Or is it too early to get access there? > Seems to be too late: xenbus_init() is a postcore_initcall. Moreover, it seems that 'dm-version' lives under 'libxl' in xenstore, not sure if it's readable from the domain.
On Tue, Mar 21, 2017 at 06:10:06PM +0100, Vitaly Kuznetsov wrote: > Boris Ostrovsky <boris.ostrovsky@oracle.com> writes: > > Can we read xenstore's dm-version? Or is it too early to get access there? > > > > Seems to be too late: xenbus_init() is a postcore_initcall. Moreover, it > seems that 'dm-version' lives under 'libxl' in xenstore, not sure if > it's readable from the domain. I don't think you can trust anything inside of "libxl/" to be stable. Roger.
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index ec1d5c4..4a30886 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -51,6 +51,7 @@ #include <xen/hvm.h> #include <xen/hvc-console.h> #include <xen/acpi.h> +#include <xen/platform_pci.h> #include <asm/paravirt.h> #include <asm/apic.h> @@ -1765,6 +1766,20 @@ void __init xen_prepare_pvh(void) x86_init.oem.arch_setup = xen_pvh_arch_setup; } + +static void xen_detect_pvh(void) +{ + short magic; + + if (xen_pvh) + return; + + magic = inw(XEN_IOPORT_MAGIC); + if (magic != XEN_IOPORT_MAGIC_VAL) { + xen_pvh = 1; + xen_pvh_arch_setup(); + } +} #endif void __ref xen_hvm_init_shared_info(void) @@ -1912,6 +1927,9 @@ static void __init xen_hvm_guest_init(void) init_hvm_pv_info(); + /* Detect PVH booting after kexec */ + xen_detect_pvh(); + xen_hvm_init_shared_info(); xen_panic_handler_init();
PVH guests after kexec boot like normal HVM guests and we're not entering xen_prepare_pvh() but we still want to know that we're PVH. This hack does the job by using XEN_IOPORT_MAGIC but I didn't find any straitforward way to do it. Did I miss something? Or shall we introduce a CPUID leaf or something like that? Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- arch/x86/xen/enlighten.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)