Message ID | 20201203124159.3688-2-paul@xen.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Xen ABI feature control | expand |
On 03.12.2020 13:41, Paul Durrant wrote: > From: Paul Durrant <pdurrant@amazon.com> > > ...to control the visibility of the FIFO event channel operations > (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to > the guest. > > These operations were added to the public header in commit d2d50c2f308f > ("evtchn: add FIFO-based event channel ABI") and the first implementation > appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement > EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61 > ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to > that, a guest issuing those operations would receive a return value of > -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but > running on an older (pre-4.4) Xen would fall back to using the 2-level event > channel interface upon seeing this return value. > > Unfortunately the uncontrolable appearance of these new operations in Xen 4.4 > onwards has implications for hibernation of some Linux guests. During resume > from hibernation, there are two kernels involved: the "boot" kernel and the > "resume" kernel. The guest boot kernel may default to use FIFO operations and > instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the > other hand, the resume kernel keeps assuming 2-level, because it was hibernated > on a version of Xen that did not support the FIFO operations. > > To maintain compatibility it is necessary to make Xen behave as it did > before the new operations were added and hence the code in this patch ensures > that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel > operations will again result in -ENOSYS being returned to the guest. I have to admit I'm now even more concerned of the control for such going into Xen, the more with the now 2nd use in the subsequent patch. The implication of this would seem to be that whenever we add new hypercalls or sub-ops, a domain creation control would also need adding determining whether that new sub-op is actually okay to use by a guest. Or else I'd be keen to up front see criteria at least roughly outlined by which it could be established whether such an override control is needed. I'm also not convinced such controls really want to be opt-in rather than opt-out. While perhaps sensible as long as a feature is experimental, not exposing stuff by default may mean slower adoption of new (and hopefully better) functionality. I realize there's still the option of having the tool stack default to enable, and just the hypervisor defaulting to disable, but anyway. > --- a/xen/arch/arm/domain.c > +++ b/xen/arch/arm/domain.c > @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) > unsigned int max_vcpus; > > /* HVM and HAP must be set. IOMMU may or may not be */ > - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != > + if ( (config->flags & > + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != > (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) > { > dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", > --- a/xen/arch/arm/domain_build.c > +++ b/xen/arch/arm/domain_build.c > @@ -2478,7 +2478,8 @@ void __init create_domUs(void) > struct domain *d; > struct xen_domctl_createdomain d_cfg = { > .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, > - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > + XEN_DOMCTL_CDF_evtchn_fifo, > .max_evtchn_port = -1, > .max_grant_frames = 64, > .max_maptrack_frames = 1024, > --- a/xen/arch/arm/setup.c > +++ b/xen/arch/arm/setup.c > @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, > struct bootmodule *xen_bootmodule; > struct domain *dom0; > struct xen_domctl_createdomain dom0_cfg = { > - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > + XEN_DOMCTL_CDF_evtchn_fifo, > .max_evtchn_port = -1, > .max_grant_frames = gnttab_dom0_frames(), > .max_maptrack_frames = -1, > --- a/xen/arch/x86/setup.c > +++ b/xen/arch/x86/setup.c > @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image, > const char *loader) > { > struct xen_domctl_createdomain dom0_cfg = { > - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0, > + .flags = XEN_DOMCTL_CDF_evtchn_fifo | > + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0), > .max_evtchn_port = -1, > .max_grant_frames = -1, > .max_maptrack_frames = -1, > --- a/xen/common/domain.c > +++ b/xen/common/domain.c > @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) > ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | > XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | > - XEN_DOMCTL_CDF_nested_virt) ) > + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) > { > dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); > return -EINVAL; All of the hunks above point out a scalability issue if we were to follow this route for even just a fair part of new sub-ops, and I suppose you've noticed this with the next patch presumably touching all the same places again. Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 03 December 2020 15:23 > To: Paul Durrant <paul@xen.org> > Cc: Paul Durrant <pdurrant@amazon.com>; Eslam Elnikety <elnikety@amazon.com>; Ian Jackson > <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Anthony PERARD <anthony.perard@citrix.com>; Andrew Cooper > <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Julien Grall <julien@xen.org>; > Stefano Stabellini <sstabellini@kernel.org>; Christian Lindig <christian.lindig@citrix.com>; David > Scott <dave@recoil.org>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Roger Pau Monné > <roger.pau@citrix.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, > ... > > On 03.12.2020 13:41, Paul Durrant wrote: > > From: Paul Durrant <pdurrant@amazon.com> > > > > ...to control the visibility of the FIFO event channel operations > > (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to > > the guest. > > > > These operations were added to the public header in commit d2d50c2f308f > > ("evtchn: add FIFO-based event channel ABI") and the first implementation > > appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement > > EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61 > > ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to > > that, a guest issuing those operations would receive a return value of > > -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but > > running on an older (pre-4.4) Xen would fall back to using the 2-level event > > channel interface upon seeing this return value. > > > > Unfortunately the uncontrolable appearance of these new operations in Xen 4.4 > > onwards has implications for hibernation of some Linux guests. During resume > > from hibernation, there are two kernels involved: the "boot" kernel and the > > "resume" kernel. The guest boot kernel may default to use FIFO operations and > > instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the > > other hand, the resume kernel keeps assuming 2-level, because it was hibernated > > on a version of Xen that did not support the FIFO operations. > > > > To maintain compatibility it is necessary to make Xen behave as it did > > before the new operations were added and hence the code in this patch ensures > > that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel > > operations will again result in -ENOSYS being returned to the guest. > > I have to admit I'm now even more concerned of the control for such > going into Xen, the more with the now 2nd use in the subsequent patch. > The implication of this would seem to be that whenever we add new > hypercalls or sub-ops, a domain creation control would also need > adding determining whether that new sub-op is actually okay to use by > a guest. Or else I'd be keen to up front see criteria at least roughly > outlined by which it could be established whether such an override > control is needed. > Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see a change in its environment unless the VM administrator wants that to happen. > I'm also not convinced such controls really want to be opt-in rather > than opt-out. They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated hypervisor version without risking crashing their VMs. > While perhaps sensible as long as a feature is > experimental, not exposing stuff by default may mean slower adoption > of new (and hopefully better) functionality. I realize there's still > the option of having the tool stack default to enable, and just the > hypervisor defaulting to disable, but anyway. > Ok. I don't see a problem in default-to-enable behaviour... but I guess we will need to add ABI features to migration stream to fix things properly. > > --- a/xen/arch/arm/domain.c > > +++ b/xen/arch/arm/domain.c > > @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) > > unsigned int max_vcpus; > > > > /* HVM and HAP must be set. IOMMU may or may not be */ > > - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != > > + if ( (config->flags & > > + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != > > (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) > > { > > dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", > > --- a/xen/arch/arm/domain_build.c > > +++ b/xen/arch/arm/domain_build.c > > @@ -2478,7 +2478,8 @@ void __init create_domUs(void) > > struct domain *d; > > struct xen_domctl_createdomain d_cfg = { > > .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, > > - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > > + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > > + XEN_DOMCTL_CDF_evtchn_fifo, > > .max_evtchn_port = -1, > > .max_grant_frames = 64, > > .max_maptrack_frames = 1024, > > --- a/xen/arch/arm/setup.c > > +++ b/xen/arch/arm/setup.c > > @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, > > struct bootmodule *xen_bootmodule; > > struct domain *dom0; > > struct xen_domctl_createdomain dom0_cfg = { > > - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > > + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > > + XEN_DOMCTL_CDF_evtchn_fifo, > > .max_evtchn_port = -1, > > .max_grant_frames = gnttab_dom0_frames(), > > .max_maptrack_frames = -1, > > --- a/xen/arch/x86/setup.c > > +++ b/xen/arch/x86/setup.c > > @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image, > > const char *loader) > > { > > struct xen_domctl_createdomain dom0_cfg = { > > - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0, > > + .flags = XEN_DOMCTL_CDF_evtchn_fifo | > > + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0), > > .max_evtchn_port = -1, > > .max_grant_frames = -1, > > .max_maptrack_frames = -1, > > --- a/xen/common/domain.c > > +++ b/xen/common/domain.c > > @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) > > ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > > XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | > > XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | > > - XEN_DOMCTL_CDF_nested_virt) ) > > + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) > > { > > dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); > > return -EINVAL; > > All of the hunks above point out a scalability issue if we were to > follow this route for even just a fair part of new sub-ops, and I > suppose you've noticed this with the next patch presumably touching > all the same places again. Indeed. This solution works for now but is probably not what we want in the long run. Same goes for the current way we control viridian features via an HVM param. It is good enough for now IMO since domctl is not a stable interface. Any ideas about how we might implement a better interface in the longer term? Paul > > Jan
On 03.12.2020 16:45, Paul Durrant wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 03 December 2020 15:23 >> >> On 03.12.2020 13:41, Paul Durrant wrote: >>> From: Paul Durrant <pdurrant@amazon.com> >>> >>> ...to control the visibility of the FIFO event channel operations >>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to >>> the guest. >>> >>> These operations were added to the public header in commit d2d50c2f308f >>> ("evtchn: add FIFO-based event channel ABI") and the first implementation >>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement >>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61 >>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to >>> that, a guest issuing those operations would receive a return value of >>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but >>> running on an older (pre-4.4) Xen would fall back to using the 2-level event >>> channel interface upon seeing this return value. >>> >>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4 >>> onwards has implications for hibernation of some Linux guests. During resume >>> from hibernation, there are two kernels involved: the "boot" kernel and the >>> "resume" kernel. The guest boot kernel may default to use FIFO operations and >>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the >>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated >>> on a version of Xen that did not support the FIFO operations. >>> >>> To maintain compatibility it is necessary to make Xen behave as it did >>> before the new operations were added and hence the code in this patch ensures >>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel >>> operations will again result in -ENOSYS being returned to the guest. >> >> I have to admit I'm now even more concerned of the control for such >> going into Xen, the more with the now 2nd use in the subsequent patch. >> The implication of this would seem to be that whenever we add new >> hypercalls or sub-ops, a domain creation control would also need >> adding determining whether that new sub-op is actually okay to use by >> a guest. Or else I'd be keen to up front see criteria at least roughly >> outlined by which it could be established whether such an override >> control is needed. >> > > Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see a change in its environment unless the VM administrator wants that to happen. A new hypercall appearing is a change to the guest's environment, yes, but a backwards compatible one. I don't see how this would harm a guest. This and ... >> I'm also not convinced such controls really want to be opt-in rather >> than opt-out. > > They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated hypervisor version without risking crashing their VMs. ... this sound to me more like workarounds for buggy guests than functionality the hypervisor _needs_ to have. (I can appreciate the specific case here for the specific scenario you provide as an exception.) >>> --- a/xen/arch/arm/domain.c >>> +++ b/xen/arch/arm/domain.c >>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) >>> unsigned int max_vcpus; >>> >>> /* HVM and HAP must be set. IOMMU may or may not be */ >>> - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != >>> + if ( (config->flags & >>> + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != >>> (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) >>> { >>> dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", >>> --- a/xen/arch/arm/domain_build.c >>> +++ b/xen/arch/arm/domain_build.c >>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void) >>> struct domain *d; >>> struct xen_domctl_createdomain d_cfg = { >>> .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, >>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, >>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>> + XEN_DOMCTL_CDF_evtchn_fifo, >>> .max_evtchn_port = -1, >>> .max_grant_frames = 64, >>> .max_maptrack_frames = 1024, >>> --- a/xen/arch/arm/setup.c >>> +++ b/xen/arch/arm/setup.c >>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, >>> struct bootmodule *xen_bootmodule; >>> struct domain *dom0; >>> struct xen_domctl_createdomain dom0_cfg = { >>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, >>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>> + XEN_DOMCTL_CDF_evtchn_fifo, >>> .max_evtchn_port = -1, >>> .max_grant_frames = gnttab_dom0_frames(), >>> .max_maptrack_frames = -1, >>> --- a/xen/arch/x86/setup.c >>> +++ b/xen/arch/x86/setup.c >>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image, >>> const char *loader) >>> { >>> struct xen_domctl_createdomain dom0_cfg = { >>> - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0, >>> + .flags = XEN_DOMCTL_CDF_evtchn_fifo | >>> + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0), >>> .max_evtchn_port = -1, >>> .max_grant_frames = -1, >>> .max_maptrack_frames = -1, >>> --- a/xen/common/domain.c >>> +++ b/xen/common/domain.c >>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) >>> ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>> XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | >>> XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | >>> - XEN_DOMCTL_CDF_nested_virt) ) >>> + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) >>> { >>> dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); >>> return -EINVAL; >> >> All of the hunks above point out a scalability issue if we were to >> follow this route for even just a fair part of new sub-ops, and I >> suppose you've noticed this with the next patch presumably touching >> all the same places again. > > Indeed. This solution works for now but is probably not what we want in the long run. Same goes for the current way we control viridian features via an HVM param. It is good enough for now IMO since domctl is not a stable interface. Any ideas about how we might implement a better interface in the longer term? While it has other downsides, Jürgen's proposal doesn't have any similar scalability issue afaics. Another possible model would seem to be to key new hypercalls to hypervisor CPUID leaf bits, and derive their availability from a guest's CPUID policy. Of course this won't work when needing to retrofit guarding like you want to do here. Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 03 December 2020 15:57 > To: paul@xen.org > Cc: 'Paul Durrant' <pdurrant@amazon.com>; 'Eslam Elnikety' <elnikety@amazon.com>; 'Ian Jackson' > <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'Andrew > Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Julien Grall' > <julien@xen.org>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Christian Lindig' > <christian.lindig@citrix.com>; 'David Scott' <dave@recoil.org>; 'Volodymyr Babchuk' > <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné' <roger.pau@citrix.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, > ... > > On 03.12.2020 16:45, Paul Durrant wrote: > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: 03 December 2020 15:23 > >> > >> On 03.12.2020 13:41, Paul Durrant wrote: > >>> From: Paul Durrant <pdurrant@amazon.com> > >>> > >>> ...to control the visibility of the FIFO event channel operations > >>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to > >>> the guest. > >>> > >>> These operations were added to the public header in commit d2d50c2f308f > >>> ("evtchn: add FIFO-based event channel ABI") and the first implementation > >>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement > >>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61 > >>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to > >>> that, a guest issuing those operations would receive a return value of > >>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but > >>> running on an older (pre-4.4) Xen would fall back to using the 2-level event > >>> channel interface upon seeing this return value. > >>> > >>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4 > >>> onwards has implications for hibernation of some Linux guests. During resume > >>> from hibernation, there are two kernels involved: the "boot" kernel and the > >>> "resume" kernel. The guest boot kernel may default to use FIFO operations and > >>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the > >>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated > >>> on a version of Xen that did not support the FIFO operations. > >>> > >>> To maintain compatibility it is necessary to make Xen behave as it did > >>> before the new operations were added and hence the code in this patch ensures > >>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel > >>> operations will again result in -ENOSYS being returned to the guest. > >> > >> I have to admit I'm now even more concerned of the control for such > >> going into Xen, the more with the now 2nd use in the subsequent patch. > >> The implication of this would seem to be that whenever we add new > >> hypercalls or sub-ops, a domain creation control would also need > >> adding determining whether that new sub-op is actually okay to use by > >> a guest. Or else I'd be keen to up front see criteria at least roughly > >> outlined by which it could be established whether such an override > >> control is needed. > >> > > > > Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be > opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see > a change in its environment unless the VM administrator wants that to happen. > > A new hypercall appearing is a change to the guest's environment, yes, > but a backwards compatible one. I don't see how this would harm a guest. Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst case, the guest is sufficiently confused that it crashes. > This and ... > > >> I'm also not convinced such controls really want to be opt-in rather > >> than opt-out. > > > > They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a > customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated > hypervisor version without risking crashing their VMs. > > ... this sound to me more like workarounds for buggy guests than > functionality the hypervisor _needs_ to have. (I can appreciate > the specific case here for the specific scenario you provide as > an exception.) If we want to have a hypervisor that can be used in a cloud environment then Xen absolutely needs this capability. > > >>> --- a/xen/arch/arm/domain.c > >>> +++ b/xen/arch/arm/domain.c > >>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) > >>> unsigned int max_vcpus; > >>> > >>> /* HVM and HAP must be set. IOMMU may or may not be */ > >>> - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != > >>> + if ( (config->flags & > >>> + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != > >>> (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) > >>> { > >>> dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", > >>> --- a/xen/arch/arm/domain_build.c > >>> +++ b/xen/arch/arm/domain_build.c > >>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void) > >>> struct domain *d; > >>> struct xen_domctl_createdomain d_cfg = { > >>> .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, > >>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > >>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>> + XEN_DOMCTL_CDF_evtchn_fifo, > >>> .max_evtchn_port = -1, > >>> .max_grant_frames = 64, > >>> .max_maptrack_frames = 1024, > >>> --- a/xen/arch/arm/setup.c > >>> +++ b/xen/arch/arm/setup.c > >>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, > >>> struct bootmodule *xen_bootmodule; > >>> struct domain *dom0; > >>> struct xen_domctl_createdomain dom0_cfg = { > >>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > >>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>> + XEN_DOMCTL_CDF_evtchn_fifo, > >>> .max_evtchn_port = -1, > >>> .max_grant_frames = gnttab_dom0_frames(), > >>> .max_maptrack_frames = -1, > >>> --- a/xen/arch/x86/setup.c > >>> +++ b/xen/arch/x86/setup.c > >>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image, > >>> const char *loader) > >>> { > >>> struct xen_domctl_createdomain dom0_cfg = { > >>> - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0, > >>> + .flags = XEN_DOMCTL_CDF_evtchn_fifo | > >>> + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0), > >>> .max_evtchn_port = -1, > >>> .max_grant_frames = -1, > >>> .max_maptrack_frames = -1, > >>> --- a/xen/common/domain.c > >>> +++ b/xen/common/domain.c > >>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) > >>> ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>> XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | > >>> XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | > >>> - XEN_DOMCTL_CDF_nested_virt) ) > >>> + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) > >>> { > >>> dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); > >>> return -EINVAL; > >> > >> All of the hunks above point out a scalability issue if we were to > >> follow this route for even just a fair part of new sub-ops, and I > >> suppose you've noticed this with the next patch presumably touching > >> all the same places again. > > > > Indeed. This solution works for now but is probably not what we want in the long run. Same goes for > the current way we control viridian features via an HVM param. It is good enough for now IMO since > domctl is not a stable interface. Any ideas about how we might implement a better interface in the > longer term? > > While it has other downsides, Jürgen's proposal doesn't have any > similar scalability issue afaics. Another possible model would > seem to be to key new hypercalls to hypervisor CPUID leaf bits, > and derive their availability from a guest's CPUID policy. Of > course this won't work when needing to retrofit guarding like > you want to do here. > Ok, I'll take a look hypfs as an immediate solution, if that's preferred. Paul
On 03.12.20 18:07, Paul Durrant wrote: >> -----Original Message----- >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 03 December 2020 15:57 >> To: paul@xen.org >> Cc: 'Paul Durrant' <pdurrant@amazon.com>; 'Eslam Elnikety' <elnikety@amazon.com>; 'Ian Jackson' >> <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'Andrew >> Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Julien Grall' >> <julien@xen.org>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Christian Lindig' >> <christian.lindig@citrix.com>; 'David Scott' <dave@recoil.org>; 'Volodymyr Babchuk' >> <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné' <roger.pau@citrix.com>; xen-devel@lists.xenproject.org >> Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, >> ... >> >> On 03.12.2020 16:45, Paul Durrant wrote: >>>> From: Jan Beulich <jbeulich@suse.com> >>>> Sent: 03 December 2020 15:23 >>>> >>>> On 03.12.2020 13:41, Paul Durrant wrote: >>>>> From: Paul Durrant <pdurrant@amazon.com> >>>>> >>>>> ...to control the visibility of the FIFO event channel operations >>>>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to >>>>> the guest. >>>>> >>>>> These operations were added to the public header in commit d2d50c2f308f >>>>> ("evtchn: add FIFO-based event channel ABI") and the first implementation >>>>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement >>>>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61 >>>>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to >>>>> that, a guest issuing those operations would receive a return value of >>>>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but >>>>> running on an older (pre-4.4) Xen would fall back to using the 2-level event >>>>> channel interface upon seeing this return value. >>>>> >>>>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4 >>>>> onwards has implications for hibernation of some Linux guests. During resume >>>>> from hibernation, there are two kernels involved: the "boot" kernel and the >>>>> "resume" kernel. The guest boot kernel may default to use FIFO operations and >>>>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the >>>>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated >>>>> on a version of Xen that did not support the FIFO operations. >>>>> >>>>> To maintain compatibility it is necessary to make Xen behave as it did >>>>> before the new operations were added and hence the code in this patch ensures >>>>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel >>>>> operations will again result in -ENOSYS being returned to the guest. >>>> >>>> I have to admit I'm now even more concerned of the control for such >>>> going into Xen, the more with the now 2nd use in the subsequent patch. >>>> The implication of this would seem to be that whenever we add new >>>> hypercalls or sub-ops, a domain creation control would also need >>>> adding determining whether that new sub-op is actually okay to use by >>>> a guest. Or else I'd be keen to up front see criteria at least roughly >>>> outlined by which it could be established whether such an override >>>> control is needed. >>>> >>> >>> Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be >> opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see >> a change in its environment unless the VM administrator wants that to happen. >> >> A new hypercall appearing is a change to the guest's environment, yes, >> but a backwards compatible one. I don't see how this would harm a guest. > > Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst case, the guest is sufficiently confused that it crashes. > >> This and ... >> >>>> I'm also not convinced such controls really want to be opt-in rather >>>> than opt-out. >>> >>> They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a >> customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated >> hypervisor version without risking crashing their VMs. >> >> ... this sound to me more like workarounds for buggy guests than >> functionality the hypervisor _needs_ to have. (I can appreciate >> the specific case here for the specific scenario you provide as >> an exception.) > > If we want to have a hypervisor that can be used in a cloud environment then Xen absolutely needs this capability. > >> >>>>> --- a/xen/arch/arm/domain.c >>>>> +++ b/xen/arch/arm/domain.c >>>>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) >>>>> unsigned int max_vcpus; >>>>> >>>>> /* HVM and HAP must be set. IOMMU may or may not be */ >>>>> - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != >>>>> + if ( (config->flags & >>>>> + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != >>>>> (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) >>>>> { >>>>> dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", >>>>> --- a/xen/arch/arm/domain_build.c >>>>> +++ b/xen/arch/arm/domain_build.c >>>>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void) >>>>> struct domain *d; >>>>> struct xen_domctl_createdomain d_cfg = { >>>>> .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, >>>>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, >>>>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>>>> + XEN_DOMCTL_CDF_evtchn_fifo, >>>>> .max_evtchn_port = -1, >>>>> .max_grant_frames = 64, >>>>> .max_maptrack_frames = 1024, >>>>> --- a/xen/arch/arm/setup.c >>>>> +++ b/xen/arch/arm/setup.c >>>>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, >>>>> struct bootmodule *xen_bootmodule; >>>>> struct domain *dom0; >>>>> struct xen_domctl_createdomain dom0_cfg = { >>>>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, >>>>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>>>> + XEN_DOMCTL_CDF_evtchn_fifo, >>>>> .max_evtchn_port = -1, >>>>> .max_grant_frames = gnttab_dom0_frames(), >>>>> .max_maptrack_frames = -1, >>>>> --- a/xen/arch/x86/setup.c >>>>> +++ b/xen/arch/x86/setup.c >>>>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image, >>>>> const char *loader) >>>>> { >>>>> struct xen_domctl_createdomain dom0_cfg = { >>>>> - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0, >>>>> + .flags = XEN_DOMCTL_CDF_evtchn_fifo | >>>>> + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0), >>>>> .max_evtchn_port = -1, >>>>> .max_grant_frames = -1, >>>>> .max_maptrack_frames = -1, >>>>> --- a/xen/common/domain.c >>>>> +++ b/xen/common/domain.c >>>>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) >>>>> ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>>>> XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | >>>>> XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | >>>>> - XEN_DOMCTL_CDF_nested_virt) ) >>>>> + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) >>>>> { >>>>> dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); >>>>> return -EINVAL; >>>> >>>> All of the hunks above point out a scalability issue if we were to >>>> follow this route for even just a fair part of new sub-ops, and I >>>> suppose you've noticed this with the next patch presumably touching >>>> all the same places again. >>> >>> Indeed. This solution works for now but is probably not what we want in the long run. Same goes for >> the current way we control viridian features via an HVM param. It is good enough for now IMO since >> domctl is not a stable interface. Any ideas about how we might implement a better interface in the >> longer term? >> >> While it has other downsides, Jürgen's proposal doesn't have any >> similar scalability issue afaics. Another possible model would >> seem to be to key new hypercalls to hypervisor CPUID leaf bits, >> and derive their availability from a guest's CPUID policy. Of >> course this won't work when needing to retrofit guarding like >> you want to do here. >> > > Ok, I'll take a look hypfs as an immediate solution, if that's preferred. Paul, if you'd like me to add a few patches to my series doing the basic framework (per-domain nodes, interfaces for setting struct domain fields via hypfs), I can do that. You could then concentrate on the tools side. Juergen
> -----Original Message----- [snip] > >>>> > >>>> All of the hunks above point out a scalability issue if we were to > >>>> follow this route for even just a fair part of new sub-ops, and I > >>>> suppose you've noticed this with the next patch presumably touching > >>>> all the same places again. > >>> > >>> Indeed. This solution works for now but is probably not what we want in the long run. Same goes > for > >> the current way we control viridian features via an HVM param. It is good enough for now IMO since > >> domctl is not a stable interface. Any ideas about how we might implement a better interface in the > >> longer term? > >> > >> While it has other downsides, Jürgen's proposal doesn't have any > >> similar scalability issue afaics. Another possible model would > >> seem to be to key new hypercalls to hypervisor CPUID leaf bits, > >> and derive their availability from a guest's CPUID policy. Of > >> course this won't work when needing to retrofit guarding like > >> you want to do here. > >> > > > > Ok, I'll take a look hypfs as an immediate solution, if that's preferred. > > Paul, if you'd like me to add a few patches to my series doing the basic > framework (per-domain nodes, interfaces for setting struct domain fields > via hypfs), I can do that. You could then concentrate on the tools side. > That would be very helpful. Thanks Juergen. Paul
On 03.12.2020 18:07, Paul Durrant wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 03 December 2020 15:57 >> >> On 03.12.2020 16:45, Paul Durrant wrote: >>>> From: Jan Beulich <jbeulich@suse.com> >>>> Sent: 03 December 2020 15:23 >>>> >>>> On 03.12.2020 13:41, Paul Durrant wrote: >>>>> From: Paul Durrant <pdurrant@amazon.com> >>>>> >>>>> ...to control the visibility of the FIFO event channel operations >>>>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to >>>>> the guest. >>>>> >>>>> These operations were added to the public header in commit d2d50c2f308f >>>>> ("evtchn: add FIFO-based event channel ABI") and the first implementation >>>>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement >>>>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61 >>>>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to >>>>> that, a guest issuing those operations would receive a return value of >>>>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but >>>>> running on an older (pre-4.4) Xen would fall back to using the 2-level event >>>>> channel interface upon seeing this return value. >>>>> >>>>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4 >>>>> onwards has implications for hibernation of some Linux guests. During resume >>>>> from hibernation, there are two kernels involved: the "boot" kernel and the >>>>> "resume" kernel. The guest boot kernel may default to use FIFO operations and >>>>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the >>>>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated >>>>> on a version of Xen that did not support the FIFO operations. >>>>> >>>>> To maintain compatibility it is necessary to make Xen behave as it did >>>>> before the new operations were added and hence the code in this patch ensures >>>>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel >>>>> operations will again result in -ENOSYS being returned to the guest. >>>> >>>> I have to admit I'm now even more concerned of the control for such >>>> going into Xen, the more with the now 2nd use in the subsequent patch. >>>> The implication of this would seem to be that whenever we add new >>>> hypercalls or sub-ops, a domain creation control would also need >>>> adding determining whether that new sub-op is actually okay to use by >>>> a guest. Or else I'd be keen to up front see criteria at least roughly >>>> outlined by which it could be established whether such an override >>>> control is needed. >>>> >>> >>> Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be >> opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see >> a change in its environment unless the VM administrator wants that to happen. >> >> A new hypercall appearing is a change to the guest's environment, yes, >> but a backwards compatible one. I don't see how this would harm a guest. > > Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst case, the guest is sufficiently confused that it crashes. If a guest recognizes a new hypercall is unavailable, why would it ever try to make use of it again, unless it knows it may appear after having been migrated (which implies it's prepared for the sudden appearance)? This to me still looks like an attempt to work around guest bugs. And a halfhearted one, as you cherry pick just two out of many additions to the original 3.0 ABI. >> This and ... >> >>>> I'm also not convinced such controls really want to be opt-in rather >>>> than opt-out. >>> >>> They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a >> customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated >> hypervisor version without risking crashing their VMs. >> >> ... this sound to me more like workarounds for buggy guests than >> functionality the hypervisor _needs_ to have. (I can appreciate >> the specific case here for the specific scenario you provide as >> an exception.) > > If we want to have a hypervisor that can be used in a cloud environment > then Xen absolutely needs this capability. As per above you can conclude that I'm still struggling to see the "why" part here. >>>>> --- a/xen/arch/arm/domain.c >>>>> +++ b/xen/arch/arm/domain.c >>>>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) >>>>> unsigned int max_vcpus; >>>>> >>>>> /* HVM and HAP must be set. IOMMU may or may not be */ >>>>> - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != >>>>> + if ( (config->flags & >>>>> + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != >>>>> (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) >>>>> { >>>>> dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", >>>>> --- a/xen/arch/arm/domain_build.c >>>>> +++ b/xen/arch/arm/domain_build.c >>>>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void) >>>>> struct domain *d; >>>>> struct xen_domctl_createdomain d_cfg = { >>>>> .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, >>>>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, >>>>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>>>> + XEN_DOMCTL_CDF_evtchn_fifo, >>>>> .max_evtchn_port = -1, >>>>> .max_grant_frames = 64, >>>>> .max_maptrack_frames = 1024, >>>>> --- a/xen/arch/arm/setup.c >>>>> +++ b/xen/arch/arm/setup.c >>>>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, >>>>> struct bootmodule *xen_bootmodule; >>>>> struct domain *dom0; >>>>> struct xen_domctl_createdomain dom0_cfg = { >>>>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, >>>>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>>>> + XEN_DOMCTL_CDF_evtchn_fifo, >>>>> .max_evtchn_port = -1, >>>>> .max_grant_frames = gnttab_dom0_frames(), >>>>> .max_maptrack_frames = -1, >>>>> --- a/xen/arch/x86/setup.c >>>>> +++ b/xen/arch/x86/setup.c >>>>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image, >>>>> const char *loader) >>>>> { >>>>> struct xen_domctl_createdomain dom0_cfg = { >>>>> - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0, >>>>> + .flags = XEN_DOMCTL_CDF_evtchn_fifo | >>>>> + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0), >>>>> .max_evtchn_port = -1, >>>>> .max_grant_frames = -1, >>>>> .max_maptrack_frames = -1, >>>>> --- a/xen/common/domain.c >>>>> +++ b/xen/common/domain.c >>>>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) >>>>> ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | >>>>> XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | >>>>> XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | >>>>> - XEN_DOMCTL_CDF_nested_virt) ) >>>>> + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) >>>>> { >>>>> dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); >>>>> return -EINVAL; >>>> >>>> All of the hunks above point out a scalability issue if we were to >>>> follow this route for even just a fair part of new sub-ops, and I >>>> suppose you've noticed this with the next patch presumably touching >>>> all the same places again. >>> >>> Indeed. This solution works for now but is probably not what we want in the long run. Same goes for >> the current way we control viridian features via an HVM param. It is good enough for now IMO since >> domctl is not a stable interface. Any ideas about how we might implement a better interface in the >> longer term? >> >> While it has other downsides, Jürgen's proposal doesn't have any >> similar scalability issue afaics. Another possible model would >> seem to be to key new hypercalls to hypervisor CPUID leaf bits, >> and derive their availability from a guest's CPUID policy. Of >> course this won't work when needing to retrofit guarding like >> you want to do here. > > Ok, I'll take a look hypfs as an immediate solution, if that's preferred. Well, as said - there are also downsides with that approach. I'm not convinced it should be just the three of us to determine which one is better overall, the more that you don't seem to be convinced anyway (and I'm not going to claim I am, in either direction). It is my understanding that based on the claimed need for this (see above), this may become very widely used functionality, and if so we want to make sure we won't regret the route we went. For starters, just to get a better overall picture, besides the two overrides you add here, do you have any plans to retroactively add further controls for past ABI additions? I don't suppose you have plans to consistently do this for all of them? I ask because I could see us going the route you've chosen for very few retroactive additions of overrides, but use the CPUID approach (provided there's a suitable Arm counterpart, and ideally some understanding whether something similar could also be found for at least the two planned new ports) for ABI additions going forward. Jan
> -----Original Message----- > From: Jan Beulich <jbeulich@suse.com> > Sent: 04 December 2020 07:53 > To: paul@xen.org > Cc: 'Paul Durrant' <pdurrant@amazon.com>; 'Eslam Elnikety' <elnikety@amazon.com>; 'Ian Jackson' > <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'Andrew > Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Julien Grall' > <julien@xen.org>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Christian Lindig' > <christian.lindig@citrix.com>; 'David Scott' <dave@recoil.org>; 'Volodymyr Babchuk' > <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné' <roger.pau@citrix.com>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, > ... > > On 03.12.2020 18:07, Paul Durrant wrote: > >> From: Jan Beulich <jbeulich@suse.com> > >> Sent: 03 December 2020 15:57 > >> > >> On 03.12.2020 16:45, Paul Durrant wrote: > >>>> From: Jan Beulich <jbeulich@suse.com> > >>>> Sent: 03 December 2020 15:23 > >>>> > >>>> On 03.12.2020 13:41, Paul Durrant wrote: > >>>>> From: Paul Durrant <pdurrant@amazon.com> > >>>>> > >>>>> ...to control the visibility of the FIFO event channel operations > >>>>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to > >>>>> the guest. > >>>>> > >>>>> These operations were added to the public header in commit d2d50c2f308f > >>>>> ("evtchn: add FIFO-based event channel ABI") and the first implementation > >>>>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement > >>>>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61 > >>>>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to > >>>>> that, a guest issuing those operations would receive a return value of > >>>>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but > >>>>> running on an older (pre-4.4) Xen would fall back to using the 2-level event > >>>>> channel interface upon seeing this return value. > >>>>> > >>>>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4 > >>>>> onwards has implications for hibernation of some Linux guests. During resume > >>>>> from hibernation, there are two kernels involved: the "boot" kernel and the > >>>>> "resume" kernel. The guest boot kernel may default to use FIFO operations and > >>>>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the > >>>>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated > >>>>> on a version of Xen that did not support the FIFO operations. > >>>>> > >>>>> To maintain compatibility it is necessary to make Xen behave as it did > >>>>> before the new operations were added and hence the code in this patch ensures > >>>>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel > >>>>> operations will again result in -ENOSYS being returned to the guest. > >>>> > >>>> I have to admit I'm now even more concerned of the control for such > >>>> going into Xen, the more with the now 2nd use in the subsequent patch. > >>>> The implication of this would seem to be that whenever we add new > >>>> hypercalls or sub-ops, a domain creation control would also need > >>>> adding determining whether that new sub-op is actually okay to use by > >>>> a guest. Or else I'd be keen to up front see criteria at least roughly > >>>> outlined by which it could be established whether such an override > >>>> control is needed. > >>>> > >>> > >>> Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be > >> opt-in on a per-domain basis, so that we know that from when a domain is first created it will not > see > >> a change in its environment unless the VM administrator wants that to happen. > >> > >> A new hypercall appearing is a change to the guest's environment, yes, > >> but a backwards compatible one. I don't see how this would harm a guest. > > > > Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we > start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer > Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst > case, the guest is sufficiently confused that it crashes. > > If a guest recognizes a new hypercall is unavailable, why would it ever try > to make use of it again, unless it knows it may appear after having been > migrated (which implies it's prepared for the sudden appearance)? This to > me still looks like an attempt to work around guest bugs. And a halfhearted > one, as you cherry pick just two out of many additions to the original 3.0 > ABI. > This is exactly an attempt to work around guest bugs, which is something a cloud provider needs to do where possible. > >> This and ... > >> > >>>> I'm also not convinced such controls really want to be opt-in rather > >>>> than opt-out. > >>> > >>> They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a > >> customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated > >> hypervisor version without risking crashing their VMs. > >> > >> ... this sound to me more like workarounds for buggy guests than > >> functionality the hypervisor _needs_ to have. (I can appreciate > >> the specific case here for the specific scenario you provide as > >> an exception.) > > > > If we want to have a hypervisor that can be used in a cloud environment > > then Xen absolutely needs this capability. > > As per above you can conclude that I'm still struggling to see the > "why" part here. > Imagine you are a customer. You boot your OS and everything is just fine... you run your workload and all is good. You then shut down your VM and re-start it. Now it starts to crash. Who are you going to blame? You did nothing to your OS or application s/w, so you are going to blame the cloud provider of course. Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer. > >>>>> --- a/xen/arch/arm/domain.c > >>>>> +++ b/xen/arch/arm/domain.c > >>>>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) > >>>>> unsigned int max_vcpus; > >>>>> > >>>>> /* HVM and HAP must be set. IOMMU may or may not be */ > >>>>> - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != > >>>>> + if ( (config->flags & > >>>>> + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != > >>>>> (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) > >>>>> { > >>>>> dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", > >>>>> --- a/xen/arch/arm/domain_build.c > >>>>> +++ b/xen/arch/arm/domain_build.c > >>>>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void) > >>>>> struct domain *d; > >>>>> struct xen_domctl_createdomain d_cfg = { > >>>>> .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, > >>>>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > >>>>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>>>> + XEN_DOMCTL_CDF_evtchn_fifo, > >>>>> .max_evtchn_port = -1, > >>>>> .max_grant_frames = 64, > >>>>> .max_maptrack_frames = 1024, > >>>>> --- a/xen/arch/arm/setup.c > >>>>> +++ b/xen/arch/arm/setup.c > >>>>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, > >>>>> struct bootmodule *xen_bootmodule; > >>>>> struct domain *dom0; > >>>>> struct xen_domctl_createdomain dom0_cfg = { > >>>>> - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, > >>>>> + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>>>> + XEN_DOMCTL_CDF_evtchn_fifo, > >>>>> .max_evtchn_port = -1, > >>>>> .max_grant_frames = gnttab_dom0_frames(), > >>>>> .max_maptrack_frames = -1, > >>>>> --- a/xen/arch/x86/setup.c > >>>>> +++ b/xen/arch/x86/setup.c > >>>>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image, > >>>>> const char *loader) > >>>>> { > >>>>> struct xen_domctl_createdomain dom0_cfg = { > >>>>> - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0, > >>>>> + .flags = XEN_DOMCTL_CDF_evtchn_fifo | > >>>>> + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0), > >>>>> .max_evtchn_port = -1, > >>>>> .max_grant_frames = -1, > >>>>> .max_maptrack_frames = -1, > >>>>> --- a/xen/common/domain.c > >>>>> +++ b/xen/common/domain.c > >>>>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) > >>>>> ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | > >>>>> XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | > >>>>> XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | > >>>>> - XEN_DOMCTL_CDF_nested_virt) ) > >>>>> + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) > >>>>> { > >>>>> dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); > >>>>> return -EINVAL; > >>>> > >>>> All of the hunks above point out a scalability issue if we were to > >>>> follow this route for even just a fair part of new sub-ops, and I > >>>> suppose you've noticed this with the next patch presumably touching > >>>> all the same places again. > >>> > >>> Indeed. This solution works for now but is probably not what we want in the long run. Same goes > for > >> the current way we control viridian features via an HVM param. It is good enough for now IMO since > >> domctl is not a stable interface. Any ideas about how we might implement a better interface in the > >> longer term? > >> > >> While it has other downsides, Jürgen's proposal doesn't have any > >> similar scalability issue afaics. Another possible model would > >> seem to be to key new hypercalls to hypervisor CPUID leaf bits, > >> and derive their availability from a guest's CPUID policy. Of > >> course this won't work when needing to retrofit guarding like > >> you want to do here. > > > > Ok, I'll take a look hypfs as an immediate solution, if that's preferred. > > Well, as said - there are also downsides with that approach. I'm > not convinced it should be just the three of us to determine which > one is better overall, the more that you don't seem to be convinced > anyway (and I'm not going to claim I am, in either direction). It > is my understanding that based on the claimed need for this (see > above), this may become very widely used functionality, and if so > we want to make sure we won't regret the route we went. > Agreed, but we don't need the final top-to-bottom solution now. The only part of this that is introducing something that is stable is the libxl part, and I think the 'feature' bitmap is reasonable future-proof (being modelled on the viridian feature bitmap, which has been extended over several releases since it was introduced). > For starters, just to get a better overall picture, besides the two > overrides you add here, do you have any plans to retroactively add > further controls for past ABI additions? I don't have any specific plans. The two I deal with here are the causes of observed differences in guest behaviour, one being an actual crash and the other affecting PV driver behaviour which may or may not be the cause of other crashes... but still something we need to have control over. > I don't suppose you have > plans to consistently do this for all of them? I ask because I > could see us going the route you've chosen for very few retroactive > additions of overrides, but use the CPUID approach (provided > there's a suitable Arm counterpart, and ideally some understanding > whether something similar could also be found for at least the two > planned new ports) for ABI additions going forward. > That sounds reasonable and I'd be perfectly happy to rip and replace the use of domain creation flags when we have something 'better'. We don't need to wait for that though... this series is perfectly functional and doesn't take us down any one-way streets (below libxl) AFAICT. Paul > Jan
On 04.12.2020 09:22, Paul Durrant wrote: >> From: Jan Beulich <jbeulich@suse.com> >> Sent: 04 December 2020 07:53 >> >> On 03.12.2020 18:07, Paul Durrant wrote: >>>> From: Jan Beulich <jbeulich@suse.com> >>>> Sent: 03 December 2020 15:57 >>>> >>>> ... this sound to me more like workarounds for buggy guests than >>>> functionality the hypervisor _needs_ to have. (I can appreciate >>>> the specific case here for the specific scenario you provide as >>>> an exception.) >>> >>> If we want to have a hypervisor that can be used in a cloud environment >>> then Xen absolutely needs this capability. >> >> As per above you can conclude that I'm still struggling to see the >> "why" part here. >> > > Imagine you are a customer. You boot your OS and everything is just fine... you run your workload and all is good. You then shut down your VM and re-start it. Now it starts to crash. Who are you going to blame? You did nothing to your OS or application s/w, so you are going to blame the cloud provider of course. That's a situation OSes are in all the time. Buggy applications may stop working on newer OS versions. It's still the application that's in need of updating then. I guess OSes may choose to work around some very common applications' bugs, but I'd then wonder on what basis "very common" gets established. I dislike the underlying asymmetry / inconsistency (if not unfairness) of such a model, despite seeing that there may be business reasons leading people to think they want something like this. > Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer. Boot the guest on a not-yet-upgraded host again, to update its kernel? >>>> While it has other downsides, Jürgen's proposal doesn't have any >>>> similar scalability issue afaics. Another possible model would >>>> seem to be to key new hypercalls to hypervisor CPUID leaf bits, >>>> and derive their availability from a guest's CPUID policy. Of >>>> course this won't work when needing to retrofit guarding like >>>> you want to do here. >>> >>> Ok, I'll take a look hypfs as an immediate solution, if that's preferred. >> >> Well, as said - there are also downsides with that approach. I'm >> not convinced it should be just the three of us to determine which >> one is better overall, the more that you don't seem to be convinced >> anyway (and I'm not going to claim I am, in either direction). It >> is my understanding that based on the claimed need for this (see >> above), this may become very widely used functionality, and if so >> we want to make sure we won't regret the route we went. >> > > Agreed, but we don't need the final top-to-bottom solution now. The only part of this that is introducing something that is stable is the libxl part, and I think the 'feature' bitmap is reasonable future-proof (being modelled on the viridian feature bitmap, which has been extended over several releases since it was introduced). > >> For starters, just to get a better overall picture, besides the two >> overrides you add here, do you have any plans to retroactively add >> further controls for past ABI additions? > > I don't have any specific plans. The two I deal with here are the causes of observed differences in guest behaviour, one being an actual crash and the other affecting PV driver behaviour which may or may not be the cause of other crashes... but still something we need to have control over. I.e. basically an arbitrary choice. This is again a symmetry / consistency / fairness issue. If a guest admin pesters his cloud provider enough, they may get the cloud provider to make hypervisor adjustments. If another guest admin simply does the technically correct thing and works out what needs fixing in the kernel, they may then figure they simply didn't shout loud enough to avoid themselves needing to do much work. Anyway - I guess I'm about the only one seeing this from a purely technical, non-business perspective. I suppose I'll continue looking at the code for the purpose of finding issues (hopefully there aren't going to be any), but will stay away from acking any parts of this. Whoever agrees with the underlying concept can then provide their acks. (As said elsewhere, in the particular case of the kexec issue with FIFO event channels here, I could maybe see that as halfway acceptable justification, albeit I did voice my concerns in that regard as well. It's still working around a shortcoming in guest software.) Nevertheless I'd like to gain clarity of what the plans are with future ABI additions. Jan
Hi, I haven't looked at the series yet. Just adding some thoughts on why one would want such option. On 04/12/2020 09:43, Jan Beulich wrote: > On 04.12.2020 09:22, Paul Durrant wrote: >>> From: Jan Beulich <jbeulich@suse.com> >>> Sent: 04 December 2020 07:53 >>> >>> On 03.12.2020 18:07, Paul Durrant wrote: >>>>> From: Jan Beulich <jbeulich@suse.com> >>>>> Sent: 03 December 2020 15:57 >>>>> >>>>> ... this sound to me more like workarounds for buggy guests than >>>>> functionality the hypervisor _needs_ to have. (I can appreciate >>>>> the specific case here for the specific scenario you provide as >>>>> an exception.) >>>> >>>> If we want to have a hypervisor that can be used in a cloud environment >>>> then Xen absolutely needs this capability. >>> >>> As per above you can conclude that I'm still struggling to see the >>> "why" part here. >>> >> >> Imagine you are a customer. You boot your OS and everything is just fine... you run your workload and all is good. You then shut down your VM and re-start it. Now it starts to crash. Who are you going to blame? You did nothing to your OS or application s/w, so you are going to blame the cloud provider of course. > > That's a situation OSes are in all the time. Buggy applications may > stop working on newer OS versions. It's still the application that's > in need of updating then. I guess OSes may choose to work around > some very common applications' bugs, but I'd then wonder on what > basis "very common" gets established. I dislike the underlying > asymmetry / inconsistency (if not unfairness) of such a model, > despite seeing that there may be business reasons leading people to > think they want something like this. The discussion seems to be geared towards buggy guest so far. However, this is not the only reason that one my want to avoid exposing some features: 1) From the recent security issues (such as XSA-343), a knob to disable FIFO would be quite beneficials for vendors that don't need the feature. 2) Fleet management purpose. You may have a fleet with multiple versions of Xen. You don't want your customer to start relying on features that may not be available on all the hosts otherwise it complicates the guest placement. FAOD, I am sure there might be other features that need to be disabled. But we have to start somewhere :). > >> Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer. > > Boot the guest on a not-yet-upgraded host again, to update its kernel? You are making the assumption that the customer would have the choice to target a specific versions of Xen. This may be undesirable for a cloud provider as suddenly your customer may want to stick on the old version of Xen.
On 04/12/2020 11:45, Julien Grall wrote: > Hi, > > I haven't looked at the series yet. Just adding some thoughts on why > one would want such option. > > On 04/12/2020 09:43, Jan Beulich wrote: >> On 04.12.2020 09:22, Paul Durrant wrote: >>>> From: Jan Beulich <jbeulich@suse.com> >>>> Sent: 04 December 2020 07:53 >>>> >>>> On 03.12.2020 18:07, Paul Durrant wrote: >>>>>> From: Jan Beulich <jbeulich@suse.com> >>>>>> Sent: 03 December 2020 15:57 >>>>>> >>>>>> ... this sound to me more like workarounds for buggy guests than >>>>>> functionality the hypervisor _needs_ to have. (I can appreciate >>>>>> the specific case here for the specific scenario you provide as >>>>>> an exception.) >>>>> >>>>> If we want to have a hypervisor that can be used in a cloud >>>>> environment >>>>> then Xen absolutely needs this capability. >>>> >>>> As per above you can conclude that I'm still struggling to see the >>>> "why" part here. >>>> >>> >>> Imagine you are a customer. You boot your OS and everything is just >>> fine... you run your workload and all is good. You then shut down >>> your VM and re-start it. Now it starts to crash. Who are you going >>> to blame? You did nothing to your OS or application s/w, so you are >>> going to blame the cloud provider of course. >> >> That's a situation OSes are in all the time. Buggy applications may >> stop working on newer OS versions. It's still the application that's >> in need of updating then. I guess OSes may choose to work around >> some very common applications' bugs, but I'd then wonder on what >> basis "very common" gets established. I dislike the underlying >> asymmetry / inconsistency (if not unfairness) of such a model, >> despite seeing that there may be business reasons leading people to >> think they want something like this. > > The discussion seems to be geared towards buggy guest so far. However, > this is not the only reason that one my want to avoid exposing some > features: > > 1) From the recent security issues (such as XSA-343), a knob to > disable FIFO would be quite beneficials for vendors that don't need > the feature. > > 2) Fleet management purpose. You may have a fleet with multiple > versions of Xen. You don't want your customer to start relying on > features that may not be available on all the hosts otherwise it > complicates the guest placement. > > FAOD, I am sure there might be other features that need to be > disabled. But we have to start somewhere :). Absolutely top of the list, importance wise, is so we can test different configurations, without needing to rebuild the hypervisor (and to a lesser extent, without having to reboot). It is a mistake that events/grants/etc were ever available unilaterally in HVM guests. This is definitely a step in the right direction (but I thought it would be too rude to ask Paul to make all of those CDF flags at once). ~Andrew
On Fri, 4 Dec 2020, Andrew Cooper wrote: > On 04/12/2020 11:45, Julien Grall wrote: > > Hi, > > > > I haven't looked at the series yet. Just adding some thoughts on why > > one would want such option. > > > > On 04/12/2020 09:43, Jan Beulich wrote: > >> On 04.12.2020 09:22, Paul Durrant wrote: > >>>> From: Jan Beulich <jbeulich@suse.com> > >>>> Sent: 04 December 2020 07:53 > >>>> > >>>> On 03.12.2020 18:07, Paul Durrant wrote: > >>>>>> From: Jan Beulich <jbeulich@suse.com> > >>>>>> Sent: 03 December 2020 15:57 > >>>>>> > >>>>>> ... this sound to me more like workarounds for buggy guests than > >>>>>> functionality the hypervisor _needs_ to have. (I can appreciate > >>>>>> the specific case here for the specific scenario you provide as > >>>>>> an exception.) > >>>>> > >>>>> If we want to have a hypervisor that can be used in a cloud > >>>>> environment > >>>>> then Xen absolutely needs this capability. > >>>> > >>>> As per above you can conclude that I'm still struggling to see the > >>>> "why" part here. > >>>> > >>> > >>> Imagine you are a customer. You boot your OS and everything is just > >>> fine... you run your workload and all is good. You then shut down > >>> your VM and re-start it. Now it starts to crash. Who are you going > >>> to blame? You did nothing to your OS or application s/w, so you are > >>> going to blame the cloud provider of course. > >> > >> That's a situation OSes are in all the time. Buggy applications may > >> stop working on newer OS versions. It's still the application that's > >> in need of updating then. I guess OSes may choose to work around > >> some very common applications' bugs, but I'd then wonder on what > >> basis "very common" gets established. I dislike the underlying > >> asymmetry / inconsistency (if not unfairness) of such a model, > >> despite seeing that there may be business reasons leading people to > >> think they want something like this. > > > > The discussion seems to be geared towards buggy guest so far. However, > > this is not the only reason that one my want to avoid exposing some > > features: > > > > 1) From the recent security issues (such as XSA-343), a knob to > > disable FIFO would be quite beneficials for vendors that don't need > > the feature. > > > > 2) Fleet management purpose. You may have a fleet with multiple > > versions of Xen. You don't want your customer to start relying on > > features that may not be available on all the hosts otherwise it > > complicates the guest placement. > > > > FAOD, I am sure there might be other features that need to be > > disabled. But we have to start somewhere :). > > Absolutely top of the list, importance wise, is so we can test different > configurations, without needing to rebuild the hypervisor (and to a > lesser extent, without having to reboot). > > It is a mistake that events/grants/etc were ever available unilaterally > in HVM guests. This is definitely a step in the right direction (but I > thought it would be too rude to ask Paul to make all of those CDF flags > at once). +1 For FuSa we'll need to be able to disable them at some point soon.
On 04/12/2020 17:41, Stefano Stabellini wrote: >>> FAOD, I am sure there might be other features that need to be >>> disabled. But we have to start somewhere :). >> Absolutely top of the list, importance wise, is so we can test different >> configurations, without needing to rebuild the hypervisor (and to a >> lesser extent, without having to reboot). >> >> It is a mistake that events/grants/etc were ever available unilaterally >> in HVM guests. This is definitely a step in the right direction (but I >> thought it would be too rude to ask Paul to make all of those CDF flags >> at once). > +1 > > For FuSa we'll need to be able to disable them at some point soon. FWIW, I have a proper plan for this stuff, which start alongside the fixed toolstack ABI, and will cover all aspects of optional functionality in a domain. ~Andrew
> -----Original Message----- > From: Andrew Cooper <andrew.cooper3@citrix.com> > Sent: 04 December 2020 17:45 > To: Stefano Stabellini <sstabellini@kernel.org> > Cc: Julien Grall <julien@xen.org>; Jan Beulich <jbeulich@suse.com>; paul@xen.org; Durrant, Paul > <pdurrant@amazon.co.uk>; Elnikety, Eslam <elnikety@amazon.com>; 'Ian Jackson' <iwj@xenproject.org>; > 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'George Dunlap' > <george.dunlap@citrix.com>; 'Christian Lindig' <christian.lindig@citrix.com>; 'David Scott' > <dave@recoil.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné' > <roger.pau@citrix.com>; xen-devel@lists.xenproject.org > Subject: RE: [EXTERNAL] [PATCH v5 1/4] domctl: introduce a new domain create flag, > XEN_DOMCTL_CDF_evtchn_fifo, ... > > CAUTION: This email originated from outside of the organization. Do not click links or open > attachments unless you can confirm the sender and know the content is safe. > > > > On 04/12/2020 17:41, Stefano Stabellini wrote: > >>> FAOD, I am sure there might be other features that need to be > >>> disabled. But we have to start somewhere :). > >> Absolutely top of the list, importance wise, is so we can test different > >> configurations, without needing to rebuild the hypervisor (and to a > >> lesser extent, without having to reboot). > >> > >> It is a mistake that events/grants/etc were ever available unilaterally > >> in HVM guests. This is definitely a step in the right direction (but I > >> thought it would be too rude to ask Paul to make all of those CDF flags > >> at once). > > +1 > > > > For FuSa we'll need to be able to disable them at some point soon. > > FWIW, I have a proper plan for this stuff, which start alongside the > fixed toolstack ABI, and will cover all aspects of optional > functionality in a domain. > OK. Can we live with this series as it stands until that point? There is some urgency to get at least these two things fixed. Paul > ~Andrew
On Fri, 4 Dec 2020, Durrant, Paul wrote: > > -----Original Message----- > > From: Andrew Cooper <andrew.cooper3@citrix.com> > > Sent: 04 December 2020 17:45 > > To: Stefano Stabellini <sstabellini@kernel.org> > > Cc: Julien Grall <julien@xen.org>; Jan Beulich <jbeulich@suse.com>; paul@xen.org; Durrant, Paul > > <pdurrant@amazon.co.uk>; Elnikety, Eslam <elnikety@amazon.com>; 'Ian Jackson' <iwj@xenproject.org>; > > 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'George Dunlap' > > <george.dunlap@citrix.com>; 'Christian Lindig' <christian.lindig@citrix.com>; 'David Scott' > > <dave@recoil.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné' > > <roger.pau@citrix.com>; xen-devel@lists.xenproject.org > > Subject: RE: [EXTERNAL] [PATCH v5 1/4] domctl: introduce a new domain create flag, > > XEN_DOMCTL_CDF_evtchn_fifo, ... > > > > CAUTION: This email originated from outside of the organization. Do not click links or open > > attachments unless you can confirm the sender and know the content is safe. > > > > > > > > On 04/12/2020 17:41, Stefano Stabellini wrote: > > >>> FAOD, I am sure there might be other features that need to be > > >>> disabled. But we have to start somewhere :). > > >> Absolutely top of the list, importance wise, is so we can test different > > >> configurations, without needing to rebuild the hypervisor (and to a > > >> lesser extent, without having to reboot). > > >> > > >> It is a mistake that events/grants/etc were ever available unilaterally > > >> in HVM guests. This is definitely a step in the right direction (but I > > >> thought it would be too rude to ask Paul to make all of those CDF flags > > >> at once). > > > +1 > > > > > > For FuSa we'll need to be able to disable them at some point soon. > > > > FWIW, I have a proper plan for this stuff, which start alongside the > > fixed toolstack ABI, and will cover all aspects of optional > > functionality in a domain. > > > > OK. Can we live with this series as it stands until that point? There is some urgency to get at least these two things fixed. I am happy to take things one step at a time, and this is a good step forward.
On 04.12.2020 12:45, Julien Grall wrote: > Hi, > > I haven't looked at the series yet. Just adding some thoughts on why one > would want such option. > > On 04/12/2020 09:43, Jan Beulich wrote: >> On 04.12.2020 09:22, Paul Durrant wrote: >>>> From: Jan Beulich <jbeulich@suse.com> >>>> Sent: 04 December 2020 07:53 >>>> >>>> On 03.12.2020 18:07, Paul Durrant wrote: >>>>>> From: Jan Beulich <jbeulich@suse.com> >>>>>> Sent: 03 December 2020 15:57 >>>>>> >>>>>> ... this sound to me more like workarounds for buggy guests than >>>>>> functionality the hypervisor _needs_ to have. (I can appreciate >>>>>> the specific case here for the specific scenario you provide as >>>>>> an exception.) >>>>> >>>>> If we want to have a hypervisor that can be used in a cloud environment >>>>> then Xen absolutely needs this capability. >>>> >>>> As per above you can conclude that I'm still struggling to see the >>>> "why" part here. >>>> >>> >>> Imagine you are a customer. You boot your OS and everything is just fine... you run your workload and all is good. You then shut down your VM and re-start it. Now it starts to crash. Who are you going to blame? You did nothing to your OS or application s/w, so you are going to blame the cloud provider of course. >> >> That's a situation OSes are in all the time. Buggy applications may >> stop working on newer OS versions. It's still the application that's >> in need of updating then. I guess OSes may choose to work around >> some very common applications' bugs, but I'd then wonder on what >> basis "very common" gets established. I dislike the underlying >> asymmetry / inconsistency (if not unfairness) of such a model, >> despite seeing that there may be business reasons leading people to >> think they want something like this. > > The discussion seems to be geared towards buggy guest so far. However, > this is not the only reason that one my want to avoid exposing some > features: > > 1) From the recent security issues (such as XSA-343), a knob to > disable FIFO would be quite beneficials for vendors that don't need the > feature. Except that this wouldn't have been suitable as a during-embargo mitigation, for its observability by guests. > 2) Fleet management purpose. You may have a fleet with multiple > versions of Xen. You don't want your customer to start relying on > features that may not be available on all the hosts otherwise it > complicates the guest placement. Guests incapable to run on older Xen are a problem in this regard anyway, aren't they? And if they are capable, I don't see what you're referring to. > FAOD, I am sure there might be other features that need to be disabled. > But we have to start somewhere :). If there is such a need, then yes, sure. But shouldn't we at least gain rough agreement on how the future is going to look like with this? IOW have in hands some at least roughly agreed criteria by which we could decide which new ABI additions will need some form of override going forward (also allowing to judge which prior additions may want to gain overrides in a retroactive fashion, or in fact should have had ones from the beginning)? >>> Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer. >> >> Boot the guest on a not-yet-upgraded host again, to update its kernel? > > You are making the assumption that the customer would have the choice to > target a specific versions of Xen. This may be undesirable for a cloud > provider as suddenly your customer may want to stick on the old version > of Xen. I've gone from you saying "You really need a solution that can restore the old VM environment, at least temporarily, for that customer." The "temporarily" to me implies that it is at least an option to tie a certain guest to a certain Xen version for in-guest upgrading purposes. If the deal with the customer doesn't include running on a certain Xen version, I don't see how this could have non-temporary consequences to the cloud provider. Jan
Hi Jan, On 07/12/2020 09:17, Jan Beulich wrote: > On 04.12.2020 12:45, Julien Grall wrote: >> 1) From the recent security issues (such as XSA-343), a knob to >> disable FIFO would be quite beneficials for vendors that don't need the >> feature. > > Except that this wouldn't have been suitable as a during-embargo > mitigation, for its observability by guests. I think you are missing my point here. If that knob was available before the security event, a vendor may have already decided to use it to disable feature affected. This vendor would not have needed to spend time to either hotpatch or reboot all its fleet. >> 2) Fleet management purpose. You may have a fleet with multiple >> versions of Xen. You don't want your customer to start relying on >> features that may not be available on all the hosts otherwise it >> complicates the guest placement. > > Guests incapable to run on older Xen are a problem in this regard > anyway, aren't they? And if they are capable, I don't see what > you're referring to. It is not about guest that cannot run on older Xen. It is more about allowing a guest to use a feature that is not yet widely available in the fleet (you don't update all the hosts in a night...). Imagine the guest owner really wants to use feature A that is only available on new Xen version. The owner may have noticed the feature on an existing running guest and would like to create a new guest that use the feature. It might be possible that there are no capacity available on the new Xen version. So the guest may start on an older capacity. I can assure you that the owner will contact the customer service of the cloud provider to ask why the feature is not available on the new guest. With a knob available, a cloud provider has more flexibility to when the feature can be exposed. >> FAOD, I am sure there might be other features that need to be disabled. >> But we have to start somewhere :). > > If there is such a need, then yes, sure. But shouldn't we at least > gain rough agreement on how the future is going to look like with > this? IOW have in hands some at least roughly agreed criteria by > which we could decide which new ABI additions will need some form > of override going forward (also allowing to judge which prior > additions may want to gain overrides in a retroactive fashion, or > in fact should have had ones from the beginning)? I think the answer is quite straight-forward: Anything exposed to the non-privileged (I include stubdomain) guest should have a knob to disable it. > >>>> Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer. >>> >>> Boot the guest on a not-yet-upgraded host again, to update its kernel? >> >> You are making the assumption that the customer would have the choice to >> target a specific versions of Xen. This may be undesirable for a cloud >> provider as suddenly your customer may want to stick on the old version >> of Xen. > > I've gone from you saying "You really need a solution that can restore > the old VM environment, at least temporarily, for that customer." The > "temporarily" to me implies that it is at least an option to tie a > certain guest to a certain Xen version for in-guest upgrading purposes. > > If the deal with the customer doesn't include running on a certain Xen > version, I don't see how this could have non-temporary consequences to > the cloud provider. I think by "you", you mean Paul and not me? Cheers,
On 07/12/2020 10:04, Julien Grall wrote: > Hi Jan, > > On 07/12/2020 09:17, Jan Beulich wrote: >> On 04.12.2020 12:45, Julien Grall wrote: >>> 1) From the recent security issues (such as XSA-343), a knob to >>> disable FIFO would be quite beneficials for vendors that don't need the >>> feature. >> >> Except that this wouldn't have been suitable as a during-embargo >> mitigation, for its observability by guests. > > I think you are missing my point here. If that knob was available before > the security event, a vendor may have already decided to use it to > disable feature affected. > > This vendor would not have needed to spend time to either hotpatch or > reboot all its fleet. > >>> 2) Fleet management purpose. You may have a fleet with multiple >>> versions of Xen. You don't want your customer to start relying on >>> features that may not be available on all the hosts otherwise it >>> complicates the guest placement. >> >> Guests incapable to run on older Xen are a problem in this regard >> anyway, aren't they? And if they are capable, I don't see what >> you're referring to. > > It is not about guest that cannot run on older Xen. It is more about > allowing a guest to use a feature that is not yet widely available in > the fleet (you don't update all the hosts in a night...). > > Imagine the guest owner really wants to use feature A that is only > available on new Xen version. The owner may have noticed the feature on > an existing running guest and would like to create a new guest that use > the feature. > > It might be possible that there are no capacity available on the new Xen > version. So the guest may start on an older capacity. > > I can assure you that the owner will contact the customer service of the > cloud provider to ask why the feature is not available on the new guest. > > With a knob available, a cloud provider has more flexibility to when the > feature can be exposed. > >>> FAOD, I am sure there might be other features that need to be disabled. >>> But we have to start somewhere :). >> >> If there is such a need, then yes, sure. But shouldn't we at least >> gain rough agreement on how the future is going to look like with >> this? IOW have in hands some at least roughly agreed criteria by >> which we could decide which new ABI additions will need some form >> of override going forward (also allowing to judge which prior >> additions may want to gain overrides in a retroactive fashion, or >> in fact should have had ones from the beginning)? > > I think the answer is quite straight-forward: Anything exposed to the > non-privileged (I include stubdomain) guest should have a knob to s/include/exclude/ > disable it. > >> >>>>> Now imagine you are the cloud provider, running Xen. What you did >>>>> was start to upgrade your hosts from an older version of Xen to a >>>>> newer version of Xen, to pick up various bug fixes and make sure >>>>> you are running a version that is within the security support >>>>> envelope. You identify that your customer's problem is a bug in >>>>> their OS that was latent on the old version of the hypervisor but >>>>> is now manifesting on the new one because it has buggy support for >>>>> a hypercall that was added between the two versions. How are you >>>>> going to fix this issue, and get your customer up and running >>>>> again? Of course you'd like your customer to upgrade their OS, but >>>>> they can't even boot it to do that. You really need a solution that >>>>> can restore the old VM environment, at least temporarily, for that >>>>> customer. >>>> >>>> Boot the guest on a not-yet-upgraded host again, to update its kernel? >>> >>> You are making the assumption that the customer would have the choice to >>> target a specific versions of Xen. This may be undesirable for a cloud >>> provider as suddenly your customer may want to stick on the old version >>> of Xen. >> >> I've gone from you saying "You really need a solution that can restore >> the old VM environment, at least temporarily, for that customer." The >> "temporarily" to me implies that it is at least an option to tie a >> certain guest to a certain Xen version for in-guest upgrading purposes. > > >> If the deal with the customer doesn't include running on a certain Xen >> version, I don't see how this could have non-temporary consequences to >> the cloud provider. > > I think by "you", you mean Paul and not me? > > Cheers, >
On 07.12.2020 11:04, Julien Grall wrote: > On 07/12/2020 09:17, Jan Beulich wrote: >> On 04.12.2020 12:45, Julien Grall wrote: >>> You are making the assumption that the customer would have the choice to >>> target a specific versions of Xen. This may be undesirable for a cloud >>> provider as suddenly your customer may want to stick on the old version >>> of Xen. >> >> I've gone from you saying "You really need a solution that can restore >> the old VM environment, at least temporarily, for that customer." The >> "temporarily" to me implies that it is at least an option to tie a >> certain guest to a certain Xen version for in-guest upgrading purposes. > > >> If the deal with the customer doesn't include running on a certain Xen >> version, I don't see how this could have non-temporary consequences to >> the cloud provider. > > I think by "you", you mean Paul and not me? Oh, right, I didn't pay attention to who wrote that text. Sorry. Jan
> -----Original Message----- [snip] > >> I've gone from you saying "You really need a solution that can restore > >> the old VM environment, at least temporarily, for that customer." The > >> "temporarily" to me implies that it is at least an option to tie a > >> certain guest to a certain Xen version for in-guest upgrading purposes. > > > Not necessarily. > >> If the deal with the customer doesn't include running on a certain Xen > >> version, I don't see how this could have non-temporary consequences to > >> the cloud provider. > > > > I think by "you", you mean Paul and not me? > > Oh, right, I didn't pay attention to who wrote that text. Sorry. > By temporary I mean that we may want to time-limit turning off a certain part of the ABU because, whilst it is problematic for some customers, it could (and is likely to) have measurable benefits to others. Thus you keep the feature off only until any customers running OS that have problems have upgraded their installations. Paul
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c index 321a13e519b5..3ca9f00d6d83 100644 --- a/tools/libs/light/libxl_create.c +++ b/tools/libs/light/libxl_create.c @@ -607,6 +607,7 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config, .max_evtchn_port = b_info->event_channels, .max_grant_frames = b_info->max_grant_frames, .max_maptrack_frames = b_info->max_maptrack_frames, + .flags = XEN_DOMCTL_CDF_evtchn_fifo, }; if (info->type != LIBXL_DOMAIN_TYPE_PV) { diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml index e878699b0a1a..fa5c7b7eb0a2 100644 --- a/tools/ocaml/libs/xc/xenctrl.ml +++ b/tools/ocaml/libs/xc/xenctrl.ml @@ -65,6 +65,7 @@ type domain_create_flag = | CDF_XS_DOMAIN | CDF_IOMMU | CDF_NESTED_VIRT + | CDF_EVTCHN_FIFO type domain_create_iommu_opts = | IOMMU_NO_SHAREPT diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli index e64907df8e7e..a872002d90cc 100644 --- a/tools/ocaml/libs/xc/xenctrl.mli +++ b/tools/ocaml/libs/xc/xenctrl.mli @@ -58,6 +58,7 @@ type domain_create_flag = | CDF_XS_DOMAIN | CDF_IOMMU | CDF_NESTED_VIRT + | CDF_EVTCHN_FIFO type domain_create_iommu_opts = | IOMMU_NO_SHAREPT diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index 18cafcdda7b1..59f947370053 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config) unsigned int max_vcpus; /* HVM and HAP must be set. IOMMU may or may not be */ - if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) != + if ( (config->flags & + ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) != (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) ) { dprintk(XENLOG_INFO, "Unsupported configuration %#x\n", diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c index e824ba34b012..13d1e79f1463 100644 --- a/xen/arch/arm/domain_build.c +++ b/xen/arch/arm/domain_build.c @@ -2478,7 +2478,8 @@ void __init create_domUs(void) struct domain *d; struct xen_domctl_createdomain d_cfg = { .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE, - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | + XEN_DOMCTL_CDF_evtchn_fifo, .max_evtchn_port = -1, .max_grant_frames = 64, .max_maptrack_frames = 1024, diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c index 7fcff9af2a7e..0267acfca16e 100644 --- a/xen/arch/arm/setup.c +++ b/xen/arch/arm/setup.c @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset, struct bootmodule *xen_bootmodule; struct domain *dom0; struct xen_domctl_createdomain dom0_cfg = { - .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap, + .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | + XEN_DOMCTL_CDF_evtchn_fifo, .max_evtchn_port = -1, .max_grant_frames = gnttab_dom0_frames(), .max_maptrack_frames = -1, diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 30d6f375a3af..e558241c73da 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image, const char *loader) { struct xen_domctl_createdomain dom0_cfg = { - .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0, + .flags = XEN_DOMCTL_CDF_evtchn_fifo | + (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0), .max_evtchn_port = -1, .max_grant_frames = -1, .max_maptrack_frames = -1, diff --git a/xen/common/domain.c b/xen/common/domain.c index f748806a450b..28592c7c8486 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config) ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap | XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off | XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu | - XEN_DOMCTL_CDF_nested_virt) ) + XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) ) { dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags); return -EINVAL; diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c index dbfba62a4934..91133bf3c263 100644 --- a/xen/common/event_channel.c +++ b/xen/common/event_channel.c @@ -1188,10 +1188,27 @@ static long evtchn_set_priority(const struct evtchn_set_priority *set_priority) return ret; } +static bool is_fifo_op(int cmd) +{ + switch ( cmd ) + { + case EVTCHNOP_init_control: + case EVTCHNOP_expand_array: + case EVTCHNOP_set_priority: + return true; + default: + return false; + } +} + long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { long rc; + if ( !(current->domain->options & XEN_DOMCTL_CDF_evtchn_fifo) && + is_fifo_op(cmd) ) + return -ENOSYS; + switch ( cmd ) { case EVTCHNOP_alloc_unbound: { @@ -1568,9 +1585,10 @@ static void domain_dump_evtchn_info(struct domain *d) unsigned int port; int irq; - printk("Event channel information for domain %d:\n" - "Polling vCPUs: {%*pbl}\n" - " port [p/m/s]\n", d->domain_id, d->max_vcpus, d->poll_mask); + printk("Event channel information for %pd:\n", d); + printk("ABI: %s\n", d->evtchn_fifo ? "FIFO" : "2-level"); + printk("Polling vCPUs: {%*pbl}\n" + " port [p/m/s]\n", d->max_vcpus, d->poll_mask); spin_lock(&d->event_lock); diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h index 666aeb71bf1b..f7149c81a7c2 100644 --- a/xen/include/public/domctl.h +++ b/xen/include/public/domctl.h @@ -70,9 +70,11 @@ struct xen_domctl_createdomain { #define XEN_DOMCTL_CDF_iommu (1U<<_XEN_DOMCTL_CDF_iommu) #define _XEN_DOMCTL_CDF_nested_virt 6 #define XEN_DOMCTL_CDF_nested_virt (1U << _XEN_DOMCTL_CDF_nested_virt) +#define _XEN_DOMCTL_CDF_evtchn_fifo 7 +#define XEN_DOMCTL_CDF_evtchn_fifo (1U << _XEN_DOMCTL_CDF_evtchn_fifo) /* Max XEN_DOMCTL_CDF_* constant. Used for ABI checking. */ -#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_nested_virt +#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_evtchn_fifo uint32_t flags;