diff mbox series

[v5,1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo, ...

Message ID 20201203124159.3688-2-paul@xen.org (mailing list archive)
State New, archived
Headers show
Series Xen ABI feature control | expand

Commit Message

Paul Durrant Dec. 3, 2020, 12:41 p.m. UTC
From: Paul Durrant <pdurrant@amazon.com>

...to control the visibility of the FIFO event channel operations
(EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
the guest.

These operations were added to the public header in commit d2d50c2f308f
("evtchn: add FIFO-based event channel ABI") and the first implementation
appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
that, a guest issuing those operations would receive a return value of
-ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
running on an older (pre-4.4) Xen would fall back to using the 2-level event
channel interface upon seeing this return value.

Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
onwards has implications for hibernation of some Linux guests. During resume
from hibernation, there are two kernels involved: the "boot" kernel and the
"resume" kernel. The guest boot kernel may default to use FIFO operations and
instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
other hand, the resume kernel keeps assuming 2-level, because it was hibernated
on a version of Xen that did not support the FIFO operations.

To maintain compatibility it is necessary to make Xen behave as it did
before the new operations were added and hence the code in this patch ensures
that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
operations will again result in -ENOSYS being returned to the guest.

This patch also adds an extra log line into the 'e' key handler output to
call out which event channel ABI is in use by a domain.

NOTE: To maintain current behavior, until a tool-stack option is added to
      control the flag, it is unconditionally set for all domains. A
      subsequent patch will introduce such tool-stack control.

Signed-off-by: Paul Durrant <pdurrant@amazon.com>
Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
---
Cc: Ian Jackson <iwj@xenproject.org>
Cc: Wei Liu <wl@xen.org>
Cc: Anthony PERARD <anthony.perard@citrix.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: George Dunlap <george.dunlap@citrix.com>
Cc: Jan Beulich <jbeulich@suse.com>
Cc: Julien Grall <julien@xen.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Christian Lindig <christian.lindig@citrix.com>
Cc: David Scott <dave@recoil.org>
Cc: Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>
Cc: "Roger Pau Monné" <roger.pau@citrix.com>

v5:
 - Flip the sense of the flag from disabling to enabling, as requested by
   Andrew

v4:
 - New in v4
---
 tools/libs/light/libxl_create.c |  1 +
 tools/ocaml/libs/xc/xenctrl.ml  |  1 +
 tools/ocaml/libs/xc/xenctrl.mli |  1 +
 xen/arch/arm/domain.c           |  3 ++-
 xen/arch/arm/domain_build.c     |  3 ++-
 xen/arch/arm/setup.c            |  3 ++-
 xen/arch/x86/setup.c            |  3 ++-
 xen/common/domain.c             |  2 +-
 xen/common/event_channel.c      | 24 +++++++++++++++++++++---
 xen/include/public/domctl.h     |  4 +++-
 10 files changed, 36 insertions(+), 9 deletions(-)

Comments

Jan Beulich Dec. 3, 2020, 3:22 p.m. UTC | #1
On 03.12.2020 13:41, Paul Durrant wrote:
> From: Paul Durrant <pdurrant@amazon.com>
> 
> ...to control the visibility of the FIFO event channel operations
> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
> the guest.
> 
> These operations were added to the public header in commit d2d50c2f308f
> ("evtchn: add FIFO-based event channel ABI") and the first implementation
> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
> that, a guest issuing those operations would receive a return value of
> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
> running on an older (pre-4.4) Xen would fall back to using the 2-level event
> channel interface upon seeing this return value.
> 
> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
> onwards has implications for hibernation of some Linux guests. During resume
> from hibernation, there are two kernels involved: the "boot" kernel and the
> "resume" kernel. The guest boot kernel may default to use FIFO operations and
> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
> on a version of Xen that did not support the FIFO operations.
> 
> To maintain compatibility it is necessary to make Xen behave as it did
> before the new operations were added and hence the code in this patch ensures
> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
> operations will again result in -ENOSYS being returned to the guest.

I have to admit I'm now even more concerned of the control for such
going into Xen, the more with the now 2nd use in the subsequent patch.
The implication of this would seem to be that whenever we add new
hypercalls or sub-ops, a domain creation control would also need
adding determining whether that new sub-op is actually okay to use by
a guest. Or else I'd be keen to up front see criteria at least roughly
outlined by which it could be established whether such an override
control is needed.

I'm also not convinced such controls really want to be opt-in rather
than opt-out. While perhaps sensible as long as a feature is
experimental, not exposing stuff by default may mean slower adoption
of new (and hopefully better) functionality. I realize there's still
the option of having the tool stack default to enable, and just the
hypervisor defaulting to disable, but anyway.

> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>      unsigned int max_vcpus;
>  
>      /* HVM and HAP must be set. IOMMU may or may not be */
> -    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
> +    if ( (config->flags &
> +          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
>           (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
>      {
>          dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
> --- a/xen/arch/arm/domain_build.c
> +++ b/xen/arch/arm/domain_build.c
> @@ -2478,7 +2478,8 @@ void __init create_domUs(void)
>          struct domain *d;
>          struct xen_domctl_createdomain d_cfg = {
>              .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
> -            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> +            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> +                     XEN_DOMCTL_CDF_evtchn_fifo,
>              .max_evtchn_port = -1,
>              .max_grant_frames = 64,
>              .max_maptrack_frames = 1024,
> --- a/xen/arch/arm/setup.c
> +++ b/xen/arch/arm/setup.c
> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>      struct bootmodule *xen_bootmodule;
>      struct domain *dom0;
>      struct xen_domctl_createdomain dom0_cfg = {
> -        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> +        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> +                 XEN_DOMCTL_CDF_evtchn_fifo,
>          .max_evtchn_port = -1,
>          .max_grant_frames = gnttab_dom0_frames(),
>          .max_maptrack_frames = -1,
> --- a/xen/arch/x86/setup.c
> +++ b/xen/arch/x86/setup.c
> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image,
>                                           const char *loader)
>  {
>      struct xen_domctl_createdomain dom0_cfg = {
> -        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
> +        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
> +                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
>          .max_evtchn_port = -1,
>          .max_grant_frames = -1,
>          .max_maptrack_frames = -1,
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
>           ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>             XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
>             XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
> -           XEN_DOMCTL_CDF_nested_virt) )
> +           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
>      {
>          dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
>          return -EINVAL;

All of the hunks above point out a scalability issue if we were to
follow this route for even just a fair part of new sub-ops, and I
suppose you've noticed this with the next patch presumably touching
all the same places again.

Jan
Paul Durrant Dec. 3, 2020, 3:45 p.m. UTC | #2
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 03 December 2020 15:23
> To: Paul Durrant <paul@xen.org>
> Cc: Paul Durrant <pdurrant@amazon.com>; Eslam Elnikety <elnikety@amazon.com>; Ian Jackson
> <iwj@xenproject.org>; Wei Liu <wl@xen.org>; Anthony PERARD <anthony.perard@citrix.com>; Andrew Cooper
> <andrew.cooper3@citrix.com>; George Dunlap <george.dunlap@citrix.com>; Julien Grall <julien@xen.org>;
> Stefano Stabellini <sstabellini@kernel.org>; Christian Lindig <christian.lindig@citrix.com>; David
> Scott <dave@recoil.org>; Volodymyr Babchuk <Volodymyr_Babchuk@epam.com>; Roger Pau Monné
> <roger.pau@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo,
> ...
> 
> On 03.12.2020 13:41, Paul Durrant wrote:
> > From: Paul Durrant <pdurrant@amazon.com>
> >
> > ...to control the visibility of the FIFO event channel operations
> > (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
> > the guest.
> >
> > These operations were added to the public header in commit d2d50c2f308f
> > ("evtchn: add FIFO-based event channel ABI") and the first implementation
> > appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
> > EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
> > ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
> > that, a guest issuing those operations would receive a return value of
> > -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
> > running on an older (pre-4.4) Xen would fall back to using the 2-level event
> > channel interface upon seeing this return value.
> >
> > Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
> > onwards has implications for hibernation of some Linux guests. During resume
> > from hibernation, there are two kernels involved: the "boot" kernel and the
> > "resume" kernel. The guest boot kernel may default to use FIFO operations and
> > instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
> > other hand, the resume kernel keeps assuming 2-level, because it was hibernated
> > on a version of Xen that did not support the FIFO operations.
> >
> > To maintain compatibility it is necessary to make Xen behave as it did
> > before the new operations were added and hence the code in this patch ensures
> > that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
> > operations will again result in -ENOSYS being returned to the guest.
> 
> I have to admit I'm now even more concerned of the control for such
> going into Xen, the more with the now 2nd use in the subsequent patch.
> The implication of this would seem to be that whenever we add new
> hypercalls or sub-ops, a domain creation control would also need
> adding determining whether that new sub-op is actually okay to use by
> a guest. Or else I'd be keen to up front see criteria at least roughly
> outlined by which it could be established whether such an override
> control is needed.
> 

Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see a change in its environment unless the VM administrator wants that to happen.

> I'm also not convinced such controls really want to be opt-in rather
> than opt-out.

They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated hypervisor version without risking crashing their VMs.

> While perhaps sensible as long as a feature is
> experimental, not exposing stuff by default may mean slower adoption
> of new (and hopefully better) functionality. I realize there's still
> the option of having the tool stack default to enable, and just the
> hypervisor defaulting to disable, but anyway.
> 

Ok. I don't see a problem in default-to-enable behaviour... but I guess we will need to add ABI features to migration stream to fix things properly.

> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> >      unsigned int max_vcpus;
> >
> >      /* HVM and HAP must be set. IOMMU may or may not be */
> > -    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
> > +    if ( (config->flags &
> > +          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
> >           (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
> >      {
> >          dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -2478,7 +2478,8 @@ void __init create_domUs(void)
> >          struct domain *d;
> >          struct xen_domctl_createdomain d_cfg = {
> >              .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
> > -            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> > +            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> > +                     XEN_DOMCTL_CDF_evtchn_fifo,
> >              .max_evtchn_port = -1,
> >              .max_grant_frames = 64,
> >              .max_maptrack_frames = 1024,
> > --- a/xen/arch/arm/setup.c
> > +++ b/xen/arch/arm/setup.c
> > @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
> >      struct bootmodule *xen_bootmodule;
> >      struct domain *dom0;
> >      struct xen_domctl_createdomain dom0_cfg = {
> > -        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> > +        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> > +                 XEN_DOMCTL_CDF_evtchn_fifo,
> >          .max_evtchn_port = -1,
> >          .max_grant_frames = gnttab_dom0_frames(),
> >          .max_maptrack_frames = -1,
> > --- a/xen/arch/x86/setup.c
> > +++ b/xen/arch/x86/setup.c
> > @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image,
> >                                           const char *loader)
> >  {
> >      struct xen_domctl_createdomain dom0_cfg = {
> > -        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
> > +        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
> > +                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
> >          .max_evtchn_port = -1,
> >          .max_grant_frames = -1,
> >          .max_maptrack_frames = -1,
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
> >           ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >             XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
> >             XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
> > -           XEN_DOMCTL_CDF_nested_virt) )
> > +           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
> >      {
> >          dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
> >          return -EINVAL;
> 
> All of the hunks above point out a scalability issue if we were to
> follow this route for even just a fair part of new sub-ops, and I
> suppose you've noticed this with the next patch presumably touching
> all the same places again.

Indeed. This solution works for now but is probably not what we want in the long run. Same goes for the current way we control viridian features via an HVM param. It is good enough for now IMO since domctl is not a stable interface. Any ideas about how we might implement a better interface in the longer term?

  Paul

> 
> Jan
Jan Beulich Dec. 3, 2020, 3:56 p.m. UTC | #3
On 03.12.2020 16:45, Paul Durrant wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 03 December 2020 15:23
>>
>> On 03.12.2020 13:41, Paul Durrant wrote:
>>> From: Paul Durrant <pdurrant@amazon.com>
>>>
>>> ...to control the visibility of the FIFO event channel operations
>>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
>>> the guest.
>>>
>>> These operations were added to the public header in commit d2d50c2f308f
>>> ("evtchn: add FIFO-based event channel ABI") and the first implementation
>>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
>>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
>>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
>>> that, a guest issuing those operations would receive a return value of
>>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
>>> running on an older (pre-4.4) Xen would fall back to using the 2-level event
>>> channel interface upon seeing this return value.
>>>
>>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
>>> onwards has implications for hibernation of some Linux guests. During resume
>>> from hibernation, there are two kernels involved: the "boot" kernel and the
>>> "resume" kernel. The guest boot kernel may default to use FIFO operations and
>>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
>>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
>>> on a version of Xen that did not support the FIFO operations.
>>>
>>> To maintain compatibility it is necessary to make Xen behave as it did
>>> before the new operations were added and hence the code in this patch ensures
>>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
>>> operations will again result in -ENOSYS being returned to the guest.
>>
>> I have to admit I'm now even more concerned of the control for such
>> going into Xen, the more with the now 2nd use in the subsequent patch.
>> The implication of this would seem to be that whenever we add new
>> hypercalls or sub-ops, a domain creation control would also need
>> adding determining whether that new sub-op is actually okay to use by
>> a guest. Or else I'd be keen to up front see criteria at least roughly
>> outlined by which it could be established whether such an override
>> control is needed.
>>
> 
> Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see a change in its environment unless the VM administrator wants that to happen.

A new hypercall appearing is a change to the guest's environment, yes,
but a backwards compatible one. I don't see how this would harm a guest.
This and ...

>> I'm also not convinced such controls really want to be opt-in rather
>> than opt-out.
> 
> They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated hypervisor version without risking crashing their VMs.

... this sound to me more like workarounds for buggy guests than
functionality the hypervisor _needs_ to have. (I can appreciate
the specific case here for the specific scenario you provide as
an exception.)

>>> --- a/xen/arch/arm/domain.c
>>> +++ b/xen/arch/arm/domain.c
>>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>      unsigned int max_vcpus;
>>>
>>>      /* HVM and HAP must be set. IOMMU may or may not be */
>>> -    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
>>> +    if ( (config->flags &
>>> +          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
>>>           (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
>>>      {
>>>          dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
>>> --- a/xen/arch/arm/domain_build.c
>>> +++ b/xen/arch/arm/domain_build.c
>>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void)
>>>          struct domain *d;
>>>          struct xen_domctl_createdomain d_cfg = {
>>>              .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
>>> -            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
>>> +            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>> +                     XEN_DOMCTL_CDF_evtchn_fifo,
>>>              .max_evtchn_port = -1,
>>>              .max_grant_frames = 64,
>>>              .max_maptrack_frames = 1024,
>>> --- a/xen/arch/arm/setup.c
>>> +++ b/xen/arch/arm/setup.c
>>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>>>      struct bootmodule *xen_bootmodule;
>>>      struct domain *dom0;
>>>      struct xen_domctl_createdomain dom0_cfg = {
>>> -        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
>>> +        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>> +                 XEN_DOMCTL_CDF_evtchn_fifo,
>>>          .max_evtchn_port = -1,
>>>          .max_grant_frames = gnttab_dom0_frames(),
>>>          .max_maptrack_frames = -1,
>>> --- a/xen/arch/x86/setup.c
>>> +++ b/xen/arch/x86/setup.c
>>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image,
>>>                                           const char *loader)
>>>  {
>>>      struct xen_domctl_createdomain dom0_cfg = {
>>> -        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
>>> +        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
>>> +                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
>>>          .max_evtchn_port = -1,
>>>          .max_grant_frames = -1,
>>>          .max_maptrack_frames = -1,
>>> --- a/xen/common/domain.c
>>> +++ b/xen/common/domain.c
>>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>           ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>>             XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
>>>             XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
>>> -           XEN_DOMCTL_CDF_nested_virt) )
>>> +           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
>>>      {
>>>          dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
>>>          return -EINVAL;
>>
>> All of the hunks above point out a scalability issue if we were to
>> follow this route for even just a fair part of new sub-ops, and I
>> suppose you've noticed this with the next patch presumably touching
>> all the same places again.
> 
> Indeed. This solution works for now but is probably not what we want in the long run. Same goes for the current way we control viridian features via an HVM param. It is good enough for now IMO since domctl is not a stable interface. Any ideas about how we might implement a better interface in the longer term?

While it has other downsides, Jürgen's proposal doesn't have any
similar scalability issue afaics. Another possible model would
seem to be to key new hypercalls to hypervisor CPUID leaf bits,
and derive their availability from a guest's CPUID policy. Of
course this won't work when needing to retrofit guarding like
you want to do here.

Jan
Paul Durrant Dec. 3, 2020, 5:07 p.m. UTC | #4
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 03 December 2020 15:57
> To: paul@xen.org
> Cc: 'Paul Durrant' <pdurrant@amazon.com>; 'Eslam Elnikety' <elnikety@amazon.com>; 'Ian Jackson'
> <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'Andrew
> Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Julien Grall'
> <julien@xen.org>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Christian Lindig'
> <christian.lindig@citrix.com>; 'David Scott' <dave@recoil.org>; 'Volodymyr Babchuk'
> <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné' <roger.pau@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo,
> ...
> 
> On 03.12.2020 16:45, Paul Durrant wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 03 December 2020 15:23
> >>
> >> On 03.12.2020 13:41, Paul Durrant wrote:
> >>> From: Paul Durrant <pdurrant@amazon.com>
> >>>
> >>> ...to control the visibility of the FIFO event channel operations
> >>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
> >>> the guest.
> >>>
> >>> These operations were added to the public header in commit d2d50c2f308f
> >>> ("evtchn: add FIFO-based event channel ABI") and the first implementation
> >>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
> >>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
> >>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
> >>> that, a guest issuing those operations would receive a return value of
> >>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
> >>> running on an older (pre-4.4) Xen would fall back to using the 2-level event
> >>> channel interface upon seeing this return value.
> >>>
> >>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
> >>> onwards has implications for hibernation of some Linux guests. During resume
> >>> from hibernation, there are two kernels involved: the "boot" kernel and the
> >>> "resume" kernel. The guest boot kernel may default to use FIFO operations and
> >>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
> >>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
> >>> on a version of Xen that did not support the FIFO operations.
> >>>
> >>> To maintain compatibility it is necessary to make Xen behave as it did
> >>> before the new operations were added and hence the code in this patch ensures
> >>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
> >>> operations will again result in -ENOSYS being returned to the guest.
> >>
> >> I have to admit I'm now even more concerned of the control for such
> >> going into Xen, the more with the now 2nd use in the subsequent patch.
> >> The implication of this would seem to be that whenever we add new
> >> hypercalls or sub-ops, a domain creation control would also need
> >> adding determining whether that new sub-op is actually okay to use by
> >> a guest. Or else I'd be keen to up front see criteria at least roughly
> >> outlined by which it could be established whether such an override
> >> control is needed.
> >>
> >
> > Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be
> opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see
> a change in its environment unless the VM administrator wants that to happen.
> 
> A new hypercall appearing is a change to the guest's environment, yes,
> but a backwards compatible one. I don't see how this would harm a guest.

Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst case, the guest is sufficiently confused that it crashes.

> This and ...
> 
> >> I'm also not convinced such controls really want to be opt-in rather
> >> than opt-out.
> >
> > They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a
> customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated
> hypervisor version without risking crashing their VMs.
> 
> ... this sound to me more like workarounds for buggy guests than
> functionality the hypervisor _needs_ to have. (I can appreciate
> the specific case here for the specific scenario you provide as
> an exception.)

If we want to have a hypervisor that can be used in a cloud environment then Xen absolutely needs this capability.

> 
> >>> --- a/xen/arch/arm/domain.c
> >>> +++ b/xen/arch/arm/domain.c
> >>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> >>>      unsigned int max_vcpus;
> >>>
> >>>      /* HVM and HAP must be set. IOMMU may or may not be */
> >>> -    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
> >>> +    if ( (config->flags &
> >>> +          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
> >>>           (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
> >>>      {
> >>>          dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
> >>> --- a/xen/arch/arm/domain_build.c
> >>> +++ b/xen/arch/arm/domain_build.c
> >>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void)
> >>>          struct domain *d;
> >>>          struct xen_domctl_createdomain d_cfg = {
> >>>              .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
> >>> -            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>> +            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>> +                     XEN_DOMCTL_CDF_evtchn_fifo,
> >>>              .max_evtchn_port = -1,
> >>>              .max_grant_frames = 64,
> >>>              .max_maptrack_frames = 1024,
> >>> --- a/xen/arch/arm/setup.c
> >>> +++ b/xen/arch/arm/setup.c
> >>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
> >>>      struct bootmodule *xen_bootmodule;
> >>>      struct domain *dom0;
> >>>      struct xen_domctl_createdomain dom0_cfg = {
> >>> -        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>> +        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>> +                 XEN_DOMCTL_CDF_evtchn_fifo,
> >>>          .max_evtchn_port = -1,
> >>>          .max_grant_frames = gnttab_dom0_frames(),
> >>>          .max_maptrack_frames = -1,
> >>> --- a/xen/arch/x86/setup.c
> >>> +++ b/xen/arch/x86/setup.c
> >>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image,
> >>>                                           const char *loader)
> >>>  {
> >>>      struct xen_domctl_createdomain dom0_cfg = {
> >>> -        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
> >>> +        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
> >>> +                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
> >>>          .max_evtchn_port = -1,
> >>>          .max_grant_frames = -1,
> >>>          .max_maptrack_frames = -1,
> >>> --- a/xen/common/domain.c
> >>> +++ b/xen/common/domain.c
> >>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
> >>>           ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>>             XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
> >>>             XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
> >>> -           XEN_DOMCTL_CDF_nested_virt) )
> >>> +           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
> >>>      {
> >>>          dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
> >>>          return -EINVAL;
> >>
> >> All of the hunks above point out a scalability issue if we were to
> >> follow this route for even just a fair part of new sub-ops, and I
> >> suppose you've noticed this with the next patch presumably touching
> >> all the same places again.
> >
> > Indeed. This solution works for now but is probably not what we want in the long run. Same goes for
> the current way we control viridian features via an HVM param. It is good enough for now IMO since
> domctl is not a stable interface. Any ideas about how we might implement a better interface in the
> longer term?
> 
> While it has other downsides, Jürgen's proposal doesn't have any
> similar scalability issue afaics. Another possible model would
> seem to be to key new hypercalls to hypervisor CPUID leaf bits,
> and derive their availability from a guest's CPUID policy. Of
> course this won't work when needing to retrofit guarding like
> you want to do here.
> 

Ok, I'll take a look hypfs as an immediate solution, if that's preferred.

  Paul
Jürgen Groß Dec. 3, 2020, 5:19 p.m. UTC | #5
On 03.12.20 18:07, Paul Durrant wrote:
>> -----Original Message-----
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 03 December 2020 15:57
>> To: paul@xen.org
>> Cc: 'Paul Durrant' <pdurrant@amazon.com>; 'Eslam Elnikety' <elnikety@amazon.com>; 'Ian Jackson'
>> <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'Andrew
>> Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Julien Grall'
>> <julien@xen.org>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Christian Lindig'
>> <christian.lindig@citrix.com>; 'David Scott' <dave@recoil.org>; 'Volodymyr Babchuk'
>> <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné' <roger.pau@citrix.com>; xen-devel@lists.xenproject.org
>> Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo,
>> ...
>>
>> On 03.12.2020 16:45, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 03 December 2020 15:23
>>>>
>>>> On 03.12.2020 13:41, Paul Durrant wrote:
>>>>> From: Paul Durrant <pdurrant@amazon.com>
>>>>>
>>>>> ...to control the visibility of the FIFO event channel operations
>>>>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
>>>>> the guest.
>>>>>
>>>>> These operations were added to the public header in commit d2d50c2f308f
>>>>> ("evtchn: add FIFO-based event channel ABI") and the first implementation
>>>>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
>>>>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
>>>>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
>>>>> that, a guest issuing those operations would receive a return value of
>>>>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
>>>>> running on an older (pre-4.4) Xen would fall back to using the 2-level event
>>>>> channel interface upon seeing this return value.
>>>>>
>>>>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
>>>>> onwards has implications for hibernation of some Linux guests. During resume
>>>>> from hibernation, there are two kernels involved: the "boot" kernel and the
>>>>> "resume" kernel. The guest boot kernel may default to use FIFO operations and
>>>>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
>>>>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
>>>>> on a version of Xen that did not support the FIFO operations.
>>>>>
>>>>> To maintain compatibility it is necessary to make Xen behave as it did
>>>>> before the new operations were added and hence the code in this patch ensures
>>>>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
>>>>> operations will again result in -ENOSYS being returned to the guest.
>>>>
>>>> I have to admit I'm now even more concerned of the control for such
>>>> going into Xen, the more with the now 2nd use in the subsequent patch.
>>>> The implication of this would seem to be that whenever we add new
>>>> hypercalls or sub-ops, a domain creation control would also need
>>>> adding determining whether that new sub-op is actually okay to use by
>>>> a guest. Or else I'd be keen to up front see criteria at least roughly
>>>> outlined by which it could be established whether such an override
>>>> control is needed.
>>>>
>>>
>>> Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be
>> opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see
>> a change in its environment unless the VM administrator wants that to happen.
>>
>> A new hypercall appearing is a change to the guest's environment, yes,
>> but a backwards compatible one. I don't see how this would harm a guest.
> 
> Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst case, the guest is sufficiently confused that it crashes.
> 
>> This and ...
>>
>>>> I'm also not convinced such controls really want to be opt-in rather
>>>> than opt-out.
>>>
>>> They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a
>> customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated
>> hypervisor version without risking crashing their VMs.
>>
>> ... this sound to me more like workarounds for buggy guests than
>> functionality the hypervisor _needs_ to have. (I can appreciate
>> the specific case here for the specific scenario you provide as
>> an exception.)
> 
> If we want to have a hypervisor that can be used in a cloud environment then Xen absolutely needs this capability.
> 
>>
>>>>> --- a/xen/arch/arm/domain.c
>>>>> +++ b/xen/arch/arm/domain.c
>>>>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>>>       unsigned int max_vcpus;
>>>>>
>>>>>       /* HVM and HAP must be set. IOMMU may or may not be */
>>>>> -    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
>>>>> +    if ( (config->flags &
>>>>> +          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
>>>>>            (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
>>>>>       {
>>>>>           dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
>>>>> --- a/xen/arch/arm/domain_build.c
>>>>> +++ b/xen/arch/arm/domain_build.c
>>>>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void)
>>>>>           struct domain *d;
>>>>>           struct xen_domctl_createdomain d_cfg = {
>>>>>               .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
>>>>> -            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
>>>>> +            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>>>> +                     XEN_DOMCTL_CDF_evtchn_fifo,
>>>>>               .max_evtchn_port = -1,
>>>>>               .max_grant_frames = 64,
>>>>>               .max_maptrack_frames = 1024,
>>>>> --- a/xen/arch/arm/setup.c
>>>>> +++ b/xen/arch/arm/setup.c
>>>>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>>>>>       struct bootmodule *xen_bootmodule;
>>>>>       struct domain *dom0;
>>>>>       struct xen_domctl_createdomain dom0_cfg = {
>>>>> -        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
>>>>> +        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>>>> +                 XEN_DOMCTL_CDF_evtchn_fifo,
>>>>>           .max_evtchn_port = -1,
>>>>>           .max_grant_frames = gnttab_dom0_frames(),
>>>>>           .max_maptrack_frames = -1,
>>>>> --- a/xen/arch/x86/setup.c
>>>>> +++ b/xen/arch/x86/setup.c
>>>>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image,
>>>>>                                            const char *loader)
>>>>>   {
>>>>>       struct xen_domctl_createdomain dom0_cfg = {
>>>>> -        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
>>>>> +        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
>>>>> +                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
>>>>>           .max_evtchn_port = -1,
>>>>>           .max_grant_frames = -1,
>>>>>           .max_maptrack_frames = -1,
>>>>> --- a/xen/common/domain.c
>>>>> +++ b/xen/common/domain.c
>>>>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>>>            ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>>>>              XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
>>>>>              XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
>>>>> -           XEN_DOMCTL_CDF_nested_virt) )
>>>>> +           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
>>>>>       {
>>>>>           dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
>>>>>           return -EINVAL;
>>>>
>>>> All of the hunks above point out a scalability issue if we were to
>>>> follow this route for even just a fair part of new sub-ops, and I
>>>> suppose you've noticed this with the next patch presumably touching
>>>> all the same places again.
>>>
>>> Indeed. This solution works for now but is probably not what we want in the long run. Same goes for
>> the current way we control viridian features via an HVM param. It is good enough for now IMO since
>> domctl is not a stable interface. Any ideas about how we might implement a better interface in the
>> longer term?
>>
>> While it has other downsides, Jürgen's proposal doesn't have any
>> similar scalability issue afaics. Another possible model would
>> seem to be to key new hypercalls to hypervisor CPUID leaf bits,
>> and derive their availability from a guest's CPUID policy. Of
>> course this won't work when needing to retrofit guarding like
>> you want to do here.
>>
> 
> Ok, I'll take a look hypfs as an immediate solution, if that's preferred.

Paul, if you'd like me to add a few patches to my series doing the basic
framework (per-domain nodes, interfaces for setting struct domain fields
via hypfs), I can do that. You could then concentrate on the tools side.


Juergen
Paul Durrant Dec. 3, 2020, 6:44 p.m. UTC | #6
> -----Original Message-----
[snip]
> >>>>
> >>>> All of the hunks above point out a scalability issue if we were to
> >>>> follow this route for even just a fair part of new sub-ops, and I
> >>>> suppose you've noticed this with the next patch presumably touching
> >>>> all the same places again.
> >>>
> >>> Indeed. This solution works for now but is probably not what we want in the long run. Same goes
> for
> >> the current way we control viridian features via an HVM param. It is good enough for now IMO since
> >> domctl is not a stable interface. Any ideas about how we might implement a better interface in the
> >> longer term?
> >>
> >> While it has other downsides, Jürgen's proposal doesn't have any
> >> similar scalability issue afaics. Another possible model would
> >> seem to be to key new hypercalls to hypervisor CPUID leaf bits,
> >> and derive their availability from a guest's CPUID policy. Of
> >> course this won't work when needing to retrofit guarding like
> >> you want to do here.
> >>
> >
> > Ok, I'll take a look hypfs as an immediate solution, if that's preferred.
> 
> Paul, if you'd like me to add a few patches to my series doing the basic
> framework (per-domain nodes, interfaces for setting struct domain fields
> via hypfs), I can do that. You could then concentrate on the tools side.
> 

That would be very helpful. Thanks Juergen.

  Paul
Jan Beulich Dec. 4, 2020, 7:53 a.m. UTC | #7
On 03.12.2020 18:07, Paul Durrant wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 03 December 2020 15:57
>>
>> On 03.12.2020 16:45, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 03 December 2020 15:23
>>>>
>>>> On 03.12.2020 13:41, Paul Durrant wrote:
>>>>> From: Paul Durrant <pdurrant@amazon.com>
>>>>>
>>>>> ...to control the visibility of the FIFO event channel operations
>>>>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
>>>>> the guest.
>>>>>
>>>>> These operations were added to the public header in commit d2d50c2f308f
>>>>> ("evtchn: add FIFO-based event channel ABI") and the first implementation
>>>>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
>>>>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
>>>>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
>>>>> that, a guest issuing those operations would receive a return value of
>>>>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
>>>>> running on an older (pre-4.4) Xen would fall back to using the 2-level event
>>>>> channel interface upon seeing this return value.
>>>>>
>>>>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
>>>>> onwards has implications for hibernation of some Linux guests. During resume
>>>>> from hibernation, there are two kernels involved: the "boot" kernel and the
>>>>> "resume" kernel. The guest boot kernel may default to use FIFO operations and
>>>>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
>>>>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
>>>>> on a version of Xen that did not support the FIFO operations.
>>>>>
>>>>> To maintain compatibility it is necessary to make Xen behave as it did
>>>>> before the new operations were added and hence the code in this patch ensures
>>>>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
>>>>> operations will again result in -ENOSYS being returned to the guest.
>>>>
>>>> I have to admit I'm now even more concerned of the control for such
>>>> going into Xen, the more with the now 2nd use in the subsequent patch.
>>>> The implication of this would seem to be that whenever we add new
>>>> hypercalls or sub-ops, a domain creation control would also need
>>>> adding determining whether that new sub-op is actually okay to use by
>>>> a guest. Or else I'd be keen to up front see criteria at least roughly
>>>> outlined by which it could be established whether such an override
>>>> control is needed.
>>>>
>>>
>>> Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be
>> opt-in on a per-domain basis, so that we know that from when a domain is first created it will not see
>> a change in its environment unless the VM administrator wants that to happen.
>>
>> A new hypercall appearing is a change to the guest's environment, yes,
>> but a backwards compatible one. I don't see how this would harm a guest.
> 
> Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst case, the guest is sufficiently confused that it crashes.

If a guest recognizes a new hypercall is unavailable, why would it ever try
to make use of it again, unless it knows it may appear after having been
migrated (which implies it's prepared for the sudden appearance)? This to
me still looks like an attempt to work around guest bugs. And a halfhearted
one, as you cherry pick just two out of many additions to the original 3.0
ABI.

>> This and ...
>>
>>>> I'm also not convinced such controls really want to be opt-in rather
>>>> than opt-out.
>>>
>>> They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a
>> customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated
>> hypervisor version without risking crashing their VMs.
>>
>> ... this sound to me more like workarounds for buggy guests than
>> functionality the hypervisor _needs_ to have. (I can appreciate
>> the specific case here for the specific scenario you provide as
>> an exception.)
> 
> If we want to have a hypervisor that can be used in a cloud environment
> then Xen absolutely needs this capability.

As per above you can conclude that I'm still struggling to see the
"why" part here.

>>>>> --- a/xen/arch/arm/domain.c
>>>>> +++ b/xen/arch/arm/domain.c
>>>>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>>>      unsigned int max_vcpus;
>>>>>
>>>>>      /* HVM and HAP must be set. IOMMU may or may not be */
>>>>> -    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
>>>>> +    if ( (config->flags &
>>>>> +          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
>>>>>           (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
>>>>>      {
>>>>>          dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
>>>>> --- a/xen/arch/arm/domain_build.c
>>>>> +++ b/xen/arch/arm/domain_build.c
>>>>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void)
>>>>>          struct domain *d;
>>>>>          struct xen_domctl_createdomain d_cfg = {
>>>>>              .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
>>>>> -            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
>>>>> +            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>>>> +                     XEN_DOMCTL_CDF_evtchn_fifo,
>>>>>              .max_evtchn_port = -1,
>>>>>              .max_grant_frames = 64,
>>>>>              .max_maptrack_frames = 1024,
>>>>> --- a/xen/arch/arm/setup.c
>>>>> +++ b/xen/arch/arm/setup.c
>>>>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
>>>>>      struct bootmodule *xen_bootmodule;
>>>>>      struct domain *dom0;
>>>>>      struct xen_domctl_createdomain dom0_cfg = {
>>>>> -        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
>>>>> +        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>>>> +                 XEN_DOMCTL_CDF_evtchn_fifo,
>>>>>          .max_evtchn_port = -1,
>>>>>          .max_grant_frames = gnttab_dom0_frames(),
>>>>>          .max_maptrack_frames = -1,
>>>>> --- a/xen/arch/x86/setup.c
>>>>> +++ b/xen/arch/x86/setup.c
>>>>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image,
>>>>>                                           const char *loader)
>>>>>  {
>>>>>      struct xen_domctl_createdomain dom0_cfg = {
>>>>> -        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
>>>>> +        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
>>>>> +                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
>>>>>          .max_evtchn_port = -1,
>>>>>          .max_grant_frames = -1,
>>>>>          .max_maptrack_frames = -1,
>>>>> --- a/xen/common/domain.c
>>>>> +++ b/xen/common/domain.c
>>>>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
>>>>>           ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
>>>>>             XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
>>>>>             XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
>>>>> -           XEN_DOMCTL_CDF_nested_virt) )
>>>>> +           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
>>>>>      {
>>>>>          dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
>>>>>          return -EINVAL;
>>>>
>>>> All of the hunks above point out a scalability issue if we were to
>>>> follow this route for even just a fair part of new sub-ops, and I
>>>> suppose you've noticed this with the next patch presumably touching
>>>> all the same places again.
>>>
>>> Indeed. This solution works for now but is probably not what we want in the long run. Same goes for
>> the current way we control viridian features via an HVM param. It is good enough for now IMO since
>> domctl is not a stable interface. Any ideas about how we might implement a better interface in the
>> longer term?
>>
>> While it has other downsides, Jürgen's proposal doesn't have any
>> similar scalability issue afaics. Another possible model would
>> seem to be to key new hypercalls to hypervisor CPUID leaf bits,
>> and derive their availability from a guest's CPUID policy. Of
>> course this won't work when needing to retrofit guarding like
>> you want to do here.
> 
> Ok, I'll take a look hypfs as an immediate solution, if that's preferred.

Well, as said - there are also downsides with that approach. I'm
not convinced it should be just the three of us to determine which
one is better overall, the more that you don't seem to be convinced
anyway (and I'm not going to claim I am, in either direction). It
is my understanding that based on the claimed need for this (see
above), this may become very widely used functionality, and if so
we want to make sure we won't regret the route we went.

For starters, just to get a better overall picture, besides the two
overrides you add here, do you have any plans to retroactively add
further controls for past ABI additions? I don't suppose you have
plans to consistently do this for all of them? I ask because I
could see us going the route you've chosen for very few retroactive
additions of overrides, but use the CPUID approach (provided
there's a suitable Arm counterpart, and ideally some understanding
whether something similar could also be found for at least the two
planned new ports) for ABI additions going forward.

Jan
Paul Durrant Dec. 4, 2020, 8:22 a.m. UTC | #8
> -----Original Message-----
> From: Jan Beulich <jbeulich@suse.com>
> Sent: 04 December 2020 07:53
> To: paul@xen.org
> Cc: 'Paul Durrant' <pdurrant@amazon.com>; 'Eslam Elnikety' <elnikety@amazon.com>; 'Ian Jackson'
> <iwj@xenproject.org>; 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'Andrew
> Cooper' <andrew.cooper3@citrix.com>; 'George Dunlap' <george.dunlap@citrix.com>; 'Julien Grall'
> <julien@xen.org>; 'Stefano Stabellini' <sstabellini@kernel.org>; 'Christian Lindig'
> <christian.lindig@citrix.com>; 'David Scott' <dave@recoil.org>; 'Volodymyr Babchuk'
> <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné' <roger.pau@citrix.com>; xen-devel@lists.xenproject.org
> Subject: Re: [PATCH v5 1/4] domctl: introduce a new domain create flag, XEN_DOMCTL_CDF_evtchn_fifo,
> ...
> 
> On 03.12.2020 18:07, Paul Durrant wrote:
> >> From: Jan Beulich <jbeulich@suse.com>
> >> Sent: 03 December 2020 15:57
> >>
> >> On 03.12.2020 16:45, Paul Durrant wrote:
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 03 December 2020 15:23
> >>>>
> >>>> On 03.12.2020 13:41, Paul Durrant wrote:
> >>>>> From: Paul Durrant <pdurrant@amazon.com>
> >>>>>
> >>>>> ...to control the visibility of the FIFO event channel operations
> >>>>> (EVTCHNOP_init_control, EVTCHNOP_expand_array, and EVTCHNOP_set_priority) to
> >>>>> the guest.
> >>>>>
> >>>>> These operations were added to the public header in commit d2d50c2f308f
> >>>>> ("evtchn: add FIFO-based event channel ABI") and the first implementation
> >>>>> appeared in the two subsequent commits: edc8872aeb4a ("evtchn: implement
> >>>>> EVTCHNOP_set_priority and add the set_priority hook") and 88910061ec61
> >>>>> ("evtchn: add FIFO-based event channel hypercalls and port ops"). Prior to
> >>>>> that, a guest issuing those operations would receive a return value of
> >>>>> -ENOSYS (not implemented) from Xen. Guests aware of the FIFO operations but
> >>>>> running on an older (pre-4.4) Xen would fall back to using the 2-level event
> >>>>> channel interface upon seeing this return value.
> >>>>>
> >>>>> Unfortunately the uncontrolable appearance of these new operations in Xen 4.4
> >>>>> onwards has implications for hibernation of some Linux guests. During resume
> >>>>> from hibernation, there are two kernels involved: the "boot" kernel and the
> >>>>> "resume" kernel. The guest boot kernel may default to use FIFO operations and
> >>>>> instruct Xen via EVTCHNOP_init_control to switch from 2-level to FIFO. On the
> >>>>> other hand, the resume kernel keeps assuming 2-level, because it was hibernated
> >>>>> on a version of Xen that did not support the FIFO operations.
> >>>>>
> >>>>> To maintain compatibility it is necessary to make Xen behave as it did
> >>>>> before the new operations were added and hence the code in this patch ensures
> >>>>> that, if XEN_DOMCTL_CDF_evtchn_fifo is not set, the FIFO event channel
> >>>>> operations will again result in -ENOSYS being returned to the guest.
> >>>>
> >>>> I have to admit I'm now even more concerned of the control for such
> >>>> going into Xen, the more with the now 2nd use in the subsequent patch.
> >>>> The implication of this would seem to be that whenever we add new
> >>>> hypercalls or sub-ops, a domain creation control would also need
> >>>> adding determining whether that new sub-op is actually okay to use by
> >>>> a guest. Or else I'd be keen to up front see criteria at least roughly
> >>>> outlined by which it could be established whether such an override
> >>>> control is needed.
> >>>>
> >>>
> >>> Ultimately I think any new hypercall (or related set of hypercalls) added to the ABI needs to be
> >> opt-in on a per-domain basis, so that we know that from when a domain is first created it will not
> see
> >> a change in its environment unless the VM administrator wants that to happen.
> >>
> >> A new hypercall appearing is a change to the guest's environment, yes,
> >> but a backwards compatible one. I don't see how this would harm a guest.
> >
> > Say we have a guest which is aware of the newer Xen and is coded to use the new hypercall *but* we
> start it on an older Xen where the new hypercall is not implemented. *Then* we migrate it to the newer
> Xen... the new hypercall suddenly becomes available and changes the guest behaviour. In the worst
> case, the guest is sufficiently confused that it crashes.
> 
> If a guest recognizes a new hypercall is unavailable, why would it ever try
> to make use of it again, unless it knows it may appear after having been
> migrated (which implies it's prepared for the sudden appearance)? This to
> me still looks like an attempt to work around guest bugs. And a halfhearted
> one, as you cherry pick just two out of many additions to the original 3.0
> ABI.
> 

This is exactly an attempt to work around guest bugs, which is something a cloud provider needs to do where possible.

> >> This and ...
> >>
> >>>> I'm also not convinced such controls really want to be opt-in rather
> >>>> than opt-out.
> >>>
> >>> They really need to be opt-in I think. From a cloud provider PoV it is important that nothing in a
> >> customer's environment changes unless we want it to. Otherwise we have no way to deploy an updated
> >> hypervisor version without risking crashing their VMs.
> >>
> >> ... this sound to me more like workarounds for buggy guests than
> >> functionality the hypervisor _needs_ to have. (I can appreciate
> >> the specific case here for the specific scenario you provide as
> >> an exception.)
> >
> > If we want to have a hypervisor that can be used in a cloud environment
> > then Xen absolutely needs this capability.
> 
> As per above you can conclude that I'm still struggling to see the
> "why" part here.
> 

Imagine you are a customer. You boot your OS and everything is just fine... you run your workload and all is good. You then shut down your VM and re-start it. Now it starts to crash. Who are you going to blame? You did nothing to your OS or application s/w, so you are going to blame the cloud provider of course.

Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer.

> >>>>> --- a/xen/arch/arm/domain.c
> >>>>> +++ b/xen/arch/arm/domain.c
> >>>>> @@ -622,7 +622,8 @@ int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> >>>>>      unsigned int max_vcpus;
> >>>>>
> >>>>>      /* HVM and HAP must be set. IOMMU may or may not be */
> >>>>> -    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
> >>>>> +    if ( (config->flags &
> >>>>> +          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
> >>>>>           (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
> >>>>>      {
> >>>>>          dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
> >>>>> --- a/xen/arch/arm/domain_build.c
> >>>>> +++ b/xen/arch/arm/domain_build.c
> >>>>> @@ -2478,7 +2478,8 @@ void __init create_domUs(void)
> >>>>>          struct domain *d;
> >>>>>          struct xen_domctl_createdomain d_cfg = {
> >>>>>              .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
> >>>>> -            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>>>> +            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>>>> +                     XEN_DOMCTL_CDF_evtchn_fifo,
> >>>>>              .max_evtchn_port = -1,
> >>>>>              .max_grant_frames = 64,
> >>>>>              .max_maptrack_frames = 1024,
> >>>>> --- a/xen/arch/arm/setup.c
> >>>>> +++ b/xen/arch/arm/setup.c
> >>>>> @@ -805,7 +805,8 @@ void __init start_xen(unsigned long boot_phys_offset,
> >>>>>      struct bootmodule *xen_bootmodule;
> >>>>>      struct domain *dom0;
> >>>>>      struct xen_domctl_createdomain dom0_cfg = {
> >>>>> -        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
> >>>>> +        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>>>> +                 XEN_DOMCTL_CDF_evtchn_fifo,
> >>>>>          .max_evtchn_port = -1,
> >>>>>          .max_grant_frames = gnttab_dom0_frames(),
> >>>>>          .max_maptrack_frames = -1,
> >>>>> --- a/xen/arch/x86/setup.c
> >>>>> +++ b/xen/arch/x86/setup.c
> >>>>> @@ -738,7 +738,8 @@ static struct domain *__init create_dom0(const module_t *image,
> >>>>>                                           const char *loader)
> >>>>>  {
> >>>>>      struct xen_domctl_createdomain dom0_cfg = {
> >>>>> -        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
> >>>>> +        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
> >>>>> +                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
> >>>>>          .max_evtchn_port = -1,
> >>>>>          .max_grant_frames = -1,
> >>>>>          .max_maptrack_frames = -1,
> >>>>> --- a/xen/common/domain.c
> >>>>> +++ b/xen/common/domain.c
> >>>>> @@ -307,7 +307,7 @@ static int sanitise_domain_config(struct xen_domctl_createdomain *config)
> >>>>>           ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
> >>>>>             XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
> >>>>>             XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
> >>>>> -           XEN_DOMCTL_CDF_nested_virt) )
> >>>>> +           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
> >>>>>      {
> >>>>>          dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
> >>>>>          return -EINVAL;
> >>>>
> >>>> All of the hunks above point out a scalability issue if we were to
> >>>> follow this route for even just a fair part of new sub-ops, and I
> >>>> suppose you've noticed this with the next patch presumably touching
> >>>> all the same places again.
> >>>
> >>> Indeed. This solution works for now but is probably not what we want in the long run. Same goes
> for
> >> the current way we control viridian features via an HVM param. It is good enough for now IMO since
> >> domctl is not a stable interface. Any ideas about how we might implement a better interface in the
> >> longer term?
> >>
> >> While it has other downsides, Jürgen's proposal doesn't have any
> >> similar scalability issue afaics. Another possible model would
> >> seem to be to key new hypercalls to hypervisor CPUID leaf bits,
> >> and derive their availability from a guest's CPUID policy. Of
> >> course this won't work when needing to retrofit guarding like
> >> you want to do here.
> >
> > Ok, I'll take a look hypfs as an immediate solution, if that's preferred.
> 
> Well, as said - there are also downsides with that approach. I'm
> not convinced it should be just the three of us to determine which
> one is better overall, the more that you don't seem to be convinced
> anyway (and I'm not going to claim I am, in either direction). It
> is my understanding that based on the claimed need for this (see
> above), this may become very widely used functionality, and if so
> we want to make sure we won't regret the route we went.
> 

Agreed, but we don't need the final top-to-bottom solution now. The only part of this that is introducing something that is stable is the libxl part, and I think the 'feature' bitmap is reasonable future-proof (being modelled on the viridian feature bitmap, which has been extended over several releases since it was introduced).

> For starters, just to get a better overall picture, besides the two
> overrides you add here, do you have any plans to retroactively add
> further controls for past ABI additions?

I don't have any specific plans. The two I deal with here are the causes of observed differences in guest behaviour, one being an actual crash and the other affecting PV driver behaviour which may or may not be the cause of other crashes... but still something we need to have control over.

> I don't suppose you have
> plans to consistently do this for all of them? I ask because I
> could see us going the route you've chosen for very few retroactive
> additions of overrides, but use the CPUID approach (provided
> there's a suitable Arm counterpart, and ideally some understanding
> whether something similar could also be found for at least the two
> planned new ports) for ABI additions going forward.
> 

That sounds reasonable and I'd be perfectly happy to rip and replace the use of domain creation flags when we have something 'better'. We don't need to wait for that though... this series is perfectly functional and doesn't take us down any one-way streets (below libxl) AFAICT.

  Paul

> Jan
Jan Beulich Dec. 4, 2020, 9:43 a.m. UTC | #9
On 04.12.2020 09:22, Paul Durrant wrote:
>> From: Jan Beulich <jbeulich@suse.com>
>> Sent: 04 December 2020 07:53
>>
>> On 03.12.2020 18:07, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 03 December 2020 15:57
>>>>
>>>> ... this sound to me more like workarounds for buggy guests than
>>>> functionality the hypervisor _needs_ to have. (I can appreciate
>>>> the specific case here for the specific scenario you provide as
>>>> an exception.)
>>>
>>> If we want to have a hypervisor that can be used in a cloud environment
>>> then Xen absolutely needs this capability.
>>
>> As per above you can conclude that I'm still struggling to see the
>> "why" part here.
>>
> 
> Imagine you are a customer. You boot your OS and everything is just fine... you run your workload and all is good. You then shut down your VM and re-start it. Now it starts to crash. Who are you going to blame? You did nothing to your OS or application s/w, so you are going to blame the cloud provider of course.

That's a situation OSes are in all the time. Buggy applications may
stop working on newer OS versions. It's still the application that's
in need of updating then. I guess OSes may choose to work around
some very common applications' bugs, but I'd then wonder on what
basis "very common" gets established. I dislike the underlying
asymmetry / inconsistency (if not unfairness) of such a model,
despite seeing that there may be business reasons leading people to
think they want something like this.

> Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer.

Boot the guest on a not-yet-upgraded host again, to update its kernel?

>>>> While it has other downsides, Jürgen's proposal doesn't have any
>>>> similar scalability issue afaics. Another possible model would
>>>> seem to be to key new hypercalls to hypervisor CPUID leaf bits,
>>>> and derive their availability from a guest's CPUID policy. Of
>>>> course this won't work when needing to retrofit guarding like
>>>> you want to do here.
>>>
>>> Ok, I'll take a look hypfs as an immediate solution, if that's preferred.
>>
>> Well, as said - there are also downsides with that approach. I'm
>> not convinced it should be just the three of us to determine which
>> one is better overall, the more that you don't seem to be convinced
>> anyway (and I'm not going to claim I am, in either direction). It
>> is my understanding that based on the claimed need for this (see
>> above), this may become very widely used functionality, and if so
>> we want to make sure we won't regret the route we went.
>>
> 
> Agreed, but we don't need the final top-to-bottom solution now. The only part of this that is introducing something that is stable is the libxl part, and I think the 'feature' bitmap is reasonable future-proof (being modelled on the viridian feature bitmap, which has been extended over several releases since it was introduced).
> 
>> For starters, just to get a better overall picture, besides the two
>> overrides you add here, do you have any plans to retroactively add
>> further controls for past ABI additions?
> 
> I don't have any specific plans. The two I deal with here are the causes of observed differences in guest behaviour, one being an actual crash and the other affecting PV driver behaviour which may or may not be the cause of other crashes... but still something we need to have control over.

I.e. basically an arbitrary choice. This is again a symmetry /
consistency / fairness issue. If a guest admin pesters his cloud provider
enough, they may get the cloud provider to make hypervisor adjustments.
If another guest admin simply does the technically correct thing and
works out what needs fixing in the kernel, they may then figure they
simply didn't shout loud enough to avoid themselves needing to do much
work.

Anyway - I guess I'm about the only one seeing this from a purely
technical, non-business perspective. I suppose I'll continue looking at
the code for the purpose of finding issues (hopefully there aren't going
to be any), but will stay away from acking any parts of this. Whoever
agrees with the underlying concept can then provide their acks. (As said
elsewhere, in the particular case of the kexec issue with FIFO event
channels here, I could maybe see that as halfway acceptable
justification, albeit I did voice my concerns in that regard as well.
It's still working around a shortcoming in guest software.)

Nevertheless I'd like to gain clarity of what the plans are with future
ABI additions.

Jan
Julien Grall Dec. 4, 2020, 11:45 a.m. UTC | #10
Hi,

I haven't looked at the series yet. Just adding some thoughts on why one 
would want such option.

On 04/12/2020 09:43, Jan Beulich wrote:
> On 04.12.2020 09:22, Paul Durrant wrote:
>>> From: Jan Beulich <jbeulich@suse.com>
>>> Sent: 04 December 2020 07:53
>>>
>>> On 03.12.2020 18:07, Paul Durrant wrote:
>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>> Sent: 03 December 2020 15:57
>>>>>
>>>>> ... this sound to me more like workarounds for buggy guests than
>>>>> functionality the hypervisor _needs_ to have. (I can appreciate
>>>>> the specific case here for the specific scenario you provide as
>>>>> an exception.)
>>>>
>>>> If we want to have a hypervisor that can be used in a cloud environment
>>>> then Xen absolutely needs this capability.
>>>
>>> As per above you can conclude that I'm still struggling to see the
>>> "why" part here.
>>>
>>
>> Imagine you are a customer. You boot your OS and everything is just fine... you run your workload and all is good. You then shut down your VM and re-start it. Now it starts to crash. Who are you going to blame? You did nothing to your OS or application s/w, so you are going to blame the cloud provider of course.
> 
> That's a situation OSes are in all the time. Buggy applications may
> stop working on newer OS versions. It's still the application that's
> in need of updating then. I guess OSes may choose to work around
> some very common applications' bugs, but I'd then wonder on what
> basis "very common" gets established. I dislike the underlying
> asymmetry / inconsistency (if not unfairness) of such a model,
> despite seeing that there may be business reasons leading people to
> think they want something like this.

The discussion seems to be geared towards buggy guest so far. However, 
this is not the only reason that one my want to avoid exposing some 
features:

    1) From the recent security issues (such as XSA-343), a knob to 
disable FIFO would be quite beneficials for vendors that don't need the 
feature.

    2) Fleet management purpose. You may have a fleet with multiple 
versions of Xen. You don't want your customer to start relying on 
features that may not be available on all the hosts otherwise it 
complicates the guest placement.

FAOD, I am sure there might be other features that need to be disabled. 
But we have to start somewhere :).

> 
>> Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer.
> 
> Boot the guest on a not-yet-upgraded host again, to update its kernel?

You are making the assumption that the customer would have the choice to 
target a specific versions of Xen. This may be undesirable for a cloud 
provider as suddenly your customer may want to stick on the old version 
of Xen.
Andrew Cooper Dec. 4, 2020, 11:52 a.m. UTC | #11
On 04/12/2020 11:45, Julien Grall wrote:
> Hi,
>
> I haven't looked at the series yet. Just adding some thoughts on why
> one would want such option.
>
> On 04/12/2020 09:43, Jan Beulich wrote:
>> On 04.12.2020 09:22, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 04 December 2020 07:53
>>>>
>>>> On 03.12.2020 18:07, Paul Durrant wrote:
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: 03 December 2020 15:57
>>>>>>
>>>>>> ... this sound to me more like workarounds for buggy guests than
>>>>>> functionality the hypervisor _needs_ to have. (I can appreciate
>>>>>> the specific case here for the specific scenario you provide as
>>>>>> an exception.)
>>>>>
>>>>> If we want to have a hypervisor that can be used in a cloud
>>>>> environment
>>>>> then Xen absolutely needs this capability.
>>>>
>>>> As per above you can conclude that I'm still struggling to see the
>>>> "why" part here.
>>>>
>>>
>>> Imagine you are a customer. You boot your OS and everything is just
>>> fine... you run your workload and all is good. You then shut down
>>> your VM and re-start it. Now it starts to crash. Who are you going
>>> to blame? You did nothing to your OS or application s/w, so you are
>>> going to blame the cloud provider of course.
>>
>> That's a situation OSes are in all the time. Buggy applications may
>> stop working on newer OS versions. It's still the application that's
>> in need of updating then. I guess OSes may choose to work around
>> some very common applications' bugs, but I'd then wonder on what
>> basis "very common" gets established. I dislike the underlying
>> asymmetry / inconsistency (if not unfairness) of such a model,
>> despite seeing that there may be business reasons leading people to
>> think they want something like this.
>
> The discussion seems to be geared towards buggy guest so far. However,
> this is not the only reason that one my want to avoid exposing some
> features:
>
>    1) From the recent security issues (such as XSA-343), a knob to
> disable FIFO would be quite beneficials for vendors that don't need
> the feature.
>
>    2) Fleet management purpose. You may have a fleet with multiple
> versions of Xen. You don't want your customer to start relying on
> features that may not be available on all the hosts otherwise it
> complicates the guest placement.
>
> FAOD, I am sure there might be other features that need to be
> disabled. But we have to start somewhere :).

Absolutely top of the list, importance wise, is so we can test different
configurations, without needing to rebuild the hypervisor (and to a
lesser extent, without having to reboot).

It is a mistake that events/grants/etc were ever available unilaterally
in HVM guests.  This is definitely a step in the right direction (but I
thought it would be too rude to ask Paul to make all of those CDF flags
at once).

~Andrew
Stefano Stabellini Dec. 4, 2020, 5:41 p.m. UTC | #12
On Fri, 4 Dec 2020, Andrew Cooper wrote:
> On 04/12/2020 11:45, Julien Grall wrote:
> > Hi,
> >
> > I haven't looked at the series yet. Just adding some thoughts on why
> > one would want such option.
> >
> > On 04/12/2020 09:43, Jan Beulich wrote:
> >> On 04.12.2020 09:22, Paul Durrant wrote:
> >>>> From: Jan Beulich <jbeulich@suse.com>
> >>>> Sent: 04 December 2020 07:53
> >>>>
> >>>> On 03.12.2020 18:07, Paul Durrant wrote:
> >>>>>> From: Jan Beulich <jbeulich@suse.com>
> >>>>>> Sent: 03 December 2020 15:57
> >>>>>>
> >>>>>> ... this sound to me more like workarounds for buggy guests than
> >>>>>> functionality the hypervisor _needs_ to have. (I can appreciate
> >>>>>> the specific case here for the specific scenario you provide as
> >>>>>> an exception.)
> >>>>>
> >>>>> If we want to have a hypervisor that can be used in a cloud
> >>>>> environment
> >>>>> then Xen absolutely needs this capability.
> >>>>
> >>>> As per above you can conclude that I'm still struggling to see the
> >>>> "why" part here.
> >>>>
> >>>
> >>> Imagine you are a customer. You boot your OS and everything is just
> >>> fine... you run your workload and all is good. You then shut down
> >>> your VM and re-start it. Now it starts to crash. Who are you going
> >>> to blame? You did nothing to your OS or application s/w, so you are
> >>> going to blame the cloud provider of course.
> >>
> >> That's a situation OSes are in all the time. Buggy applications may
> >> stop working on newer OS versions. It's still the application that's
> >> in need of updating then. I guess OSes may choose to work around
> >> some very common applications' bugs, but I'd then wonder on what
> >> basis "very common" gets established. I dislike the underlying
> >> asymmetry / inconsistency (if not unfairness) of such a model,
> >> despite seeing that there may be business reasons leading people to
> >> think they want something like this.
> >
> > The discussion seems to be geared towards buggy guest so far. However,
> > this is not the only reason that one my want to avoid exposing some
> > features:
> >
> >    1) From the recent security issues (such as XSA-343), a knob to
> > disable FIFO would be quite beneficials for vendors that don't need
> > the feature.
> >
> >    2) Fleet management purpose. You may have a fleet with multiple
> > versions of Xen. You don't want your customer to start relying on
> > features that may not be available on all the hosts otherwise it
> > complicates the guest placement.
> >
> > FAOD, I am sure there might be other features that need to be
> > disabled. But we have to start somewhere :).
> 
> Absolutely top of the list, importance wise, is so we can test different
> configurations, without needing to rebuild the hypervisor (and to a
> lesser extent, without having to reboot).
> 
> It is a mistake that events/grants/etc were ever available unilaterally
> in HVM guests.  This is definitely a step in the right direction (but I
> thought it would be too rude to ask Paul to make all of those CDF flags
> at once).

+1

For FuSa we'll need to be able to disable them at some point soon.
Andrew Cooper Dec. 4, 2020, 5:45 p.m. UTC | #13
On 04/12/2020 17:41, Stefano Stabellini wrote:
>>> FAOD, I am sure there might be other features that need to be
>>> disabled. But we have to start somewhere :).
>> Absolutely top of the list, importance wise, is so we can test different
>> configurations, without needing to rebuild the hypervisor (and to a
>> lesser extent, without having to reboot).
>>
>> It is a mistake that events/grants/etc were ever available unilaterally
>> in HVM guests.  This is definitely a step in the right direction (but I
>> thought it would be too rude to ask Paul to make all of those CDF flags
>> at once).
> +1
>
> For FuSa we'll need to be able to disable them at some point soon.

FWIW, I have a proper plan for this stuff, which start alongside the
fixed toolstack ABI, and will cover all aspects of optional
functionality in a domain.

~Andrew
Durrant, Paul Dec. 4, 2020, 6:33 p.m. UTC | #14
> -----Original Message-----
> From: Andrew Cooper <andrew.cooper3@citrix.com>
> Sent: 04 December 2020 17:45
> To: Stefano Stabellini <sstabellini@kernel.org>
> Cc: Julien Grall <julien@xen.org>; Jan Beulich <jbeulich@suse.com>; paul@xen.org; Durrant, Paul
> <pdurrant@amazon.co.uk>; Elnikety, Eslam <elnikety@amazon.com>; 'Ian Jackson' <iwj@xenproject.org>;
> 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'George Dunlap'
> <george.dunlap@citrix.com>; 'Christian Lindig' <christian.lindig@citrix.com>; 'David Scott'
> <dave@recoil.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné'
> <roger.pau@citrix.com>; xen-devel@lists.xenproject.org
> Subject: RE: [EXTERNAL] [PATCH v5 1/4] domctl: introduce a new domain create flag,
> XEN_DOMCTL_CDF_evtchn_fifo, ...
> 
> CAUTION: This email originated from outside of the organization. Do not click links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 04/12/2020 17:41, Stefano Stabellini wrote:
> >>> FAOD, I am sure there might be other features that need to be
> >>> disabled. But we have to start somewhere :).
> >> Absolutely top of the list, importance wise, is so we can test different
> >> configurations, without needing to rebuild the hypervisor (and to a
> >> lesser extent, without having to reboot).
> >>
> >> It is a mistake that events/grants/etc were ever available unilaterally
> >> in HVM guests.  This is definitely a step in the right direction (but I
> >> thought it would be too rude to ask Paul to make all of those CDF flags
> >> at once).
> > +1
> >
> > For FuSa we'll need to be able to disable them at some point soon.
> 
> FWIW, I have a proper plan for this stuff, which start alongside the
> fixed toolstack ABI, and will cover all aspects of optional
> functionality in a domain.
> 

OK. Can we live with this series as it stands until that point? There is some urgency to get at least these two things fixed.

  Paul

> ~Andrew
Stefano Stabellini Dec. 5, 2020, 1:34 a.m. UTC | #15
On Fri, 4 Dec 2020, Durrant, Paul wrote:
> > -----Original Message-----
> > From: Andrew Cooper <andrew.cooper3@citrix.com>
> > Sent: 04 December 2020 17:45
> > To: Stefano Stabellini <sstabellini@kernel.org>
> > Cc: Julien Grall <julien@xen.org>; Jan Beulich <jbeulich@suse.com>; paul@xen.org; Durrant, Paul
> > <pdurrant@amazon.co.uk>; Elnikety, Eslam <elnikety@amazon.com>; 'Ian Jackson' <iwj@xenproject.org>;
> > 'Wei Liu' <wl@xen.org>; 'Anthony PERARD' <anthony.perard@citrix.com>; 'George Dunlap'
> > <george.dunlap@citrix.com>; 'Christian Lindig' <christian.lindig@citrix.com>; 'David Scott'
> > <dave@recoil.org>; 'Volodymyr Babchuk' <Volodymyr_Babchuk@epam.com>; 'Roger Pau Monné'
> > <roger.pau@citrix.com>; xen-devel@lists.xenproject.org
> > Subject: RE: [EXTERNAL] [PATCH v5 1/4] domctl: introduce a new domain create flag,
> > XEN_DOMCTL_CDF_evtchn_fifo, ...
> > 
> > CAUTION: This email originated from outside of the organization. Do not click links or open
> > attachments unless you can confirm the sender and know the content is safe.
> > 
> > 
> > 
> > On 04/12/2020 17:41, Stefano Stabellini wrote:
> > >>> FAOD, I am sure there might be other features that need to be
> > >>> disabled. But we have to start somewhere :).
> > >> Absolutely top of the list, importance wise, is so we can test different
> > >> configurations, without needing to rebuild the hypervisor (and to a
> > >> lesser extent, without having to reboot).
> > >>
> > >> It is a mistake that events/grants/etc were ever available unilaterally
> > >> in HVM guests.  This is definitely a step in the right direction (but I
> > >> thought it would be too rude to ask Paul to make all of those CDF flags
> > >> at once).
> > > +1
> > >
> > > For FuSa we'll need to be able to disable them at some point soon.
> > 
> > FWIW, I have a proper plan for this stuff, which start alongside the
> > fixed toolstack ABI, and will cover all aspects of optional
> > functionality in a domain.
> > 
> 
> OK. Can we live with this series as it stands until that point? There is some urgency to get at least these two things fixed.

I am happy to take things one step at a time, and this is a good step
forward.
Jan Beulich Dec. 7, 2020, 9:17 a.m. UTC | #16
On 04.12.2020 12:45, Julien Grall wrote:
> Hi,
> 
> I haven't looked at the series yet. Just adding some thoughts on why one 
> would want such option.
> 
> On 04/12/2020 09:43, Jan Beulich wrote:
>> On 04.12.2020 09:22, Paul Durrant wrote:
>>>> From: Jan Beulich <jbeulich@suse.com>
>>>> Sent: 04 December 2020 07:53
>>>>
>>>> On 03.12.2020 18:07, Paul Durrant wrote:
>>>>>> From: Jan Beulich <jbeulich@suse.com>
>>>>>> Sent: 03 December 2020 15:57
>>>>>>
>>>>>> ... this sound to me more like workarounds for buggy guests than
>>>>>> functionality the hypervisor _needs_ to have. (I can appreciate
>>>>>> the specific case here for the specific scenario you provide as
>>>>>> an exception.)
>>>>>
>>>>> If we want to have a hypervisor that can be used in a cloud environment
>>>>> then Xen absolutely needs this capability.
>>>>
>>>> As per above you can conclude that I'm still struggling to see the
>>>> "why" part here.
>>>>
>>>
>>> Imagine you are a customer. You boot your OS and everything is just fine... you run your workload and all is good. You then shut down your VM and re-start it. Now it starts to crash. Who are you going to blame? You did nothing to your OS or application s/w, so you are going to blame the cloud provider of course.
>>
>> That's a situation OSes are in all the time. Buggy applications may
>> stop working on newer OS versions. It's still the application that's
>> in need of updating then. I guess OSes may choose to work around
>> some very common applications' bugs, but I'd then wonder on what
>> basis "very common" gets established. I dislike the underlying
>> asymmetry / inconsistency (if not unfairness) of such a model,
>> despite seeing that there may be business reasons leading people to
>> think they want something like this.
> 
> The discussion seems to be geared towards buggy guest so far. However, 
> this is not the only reason that one my want to avoid exposing some 
> features:
> 
>     1) From the recent security issues (such as XSA-343), a knob to 
> disable FIFO would be quite beneficials for vendors that don't need the 
> feature.

Except that this wouldn't have been suitable as a during-embargo
mitigation, for its observability by guests.

>     2) Fleet management purpose. You may have a fleet with multiple 
> versions of Xen. You don't want your customer to start relying on 
> features that may not be available on all the hosts otherwise it 
> complicates the guest placement.

Guests incapable to run on older Xen are a problem in this regard
anyway, aren't they? And if they are capable, I don't see what
you're referring to.

> FAOD, I am sure there might be other features that need to be disabled. 
> But we have to start somewhere :).

If there is such a need, then yes, sure. But shouldn't we at least
gain rough agreement on how the future is going to look like with
this? IOW have in hands some at least roughly agreed criteria by
which we could decide which new ABI additions will need some form
of override going forward (also allowing to judge which prior
additions may want to gain overrides in a retroactive fashion, or
in fact should have had ones from the beginning)?

>>> Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer.
>>
>> Boot the guest on a not-yet-upgraded host again, to update its kernel?
> 
> You are making the assumption that the customer would have the choice to 
> target a specific versions of Xen. This may be undesirable for a cloud 
> provider as suddenly your customer may want to stick on the old version 
> of Xen.

I've gone from you saying "You really need a solution that can restore
the old VM environment, at least temporarily, for that customer." The
"temporarily" to me implies that it is at least an option to tie a
certain guest to a certain Xen version for in-guest upgrading purposes.
If the deal with the customer doesn't include running on a certain Xen
version, I don't see how this could have non-temporary consequences to
the cloud provider.

Jan
Julien Grall Dec. 7, 2020, 10:04 a.m. UTC | #17
Hi Jan,

On 07/12/2020 09:17, Jan Beulich wrote:
> On 04.12.2020 12:45, Julien Grall wrote:
>>      1) From the recent security issues (such as XSA-343), a knob to
>> disable FIFO would be quite beneficials for vendors that don't need the
>> feature.
> 
> Except that this wouldn't have been suitable as a during-embargo
> mitigation, for its observability by guests.

I think you are missing my point here. If that knob was available before 
the security event, a vendor may have already decided to use it to 
disable feature affected.

This vendor would not have needed to spend time to either hotpatch or 
reboot all its fleet.

>>      2) Fleet management purpose. You may have a fleet with multiple
>> versions of Xen. You don't want your customer to start relying on
>> features that may not be available on all the hosts otherwise it
>> complicates the guest placement.
> 
> Guests incapable to run on older Xen are a problem in this regard
> anyway, aren't they? And if they are capable, I don't see what
> you're referring to.

It is not about guest that cannot run on older Xen. It is more about 
allowing a guest to use a feature that is not yet widely available in 
the fleet (you don't update all the hosts in a night...).

Imagine the guest owner really wants to use feature A that is only 
available on new Xen version. The owner may have noticed the feature on 
an existing running guest and would like to create a new guest that use 
the feature.

It might be possible that there are no capacity available on the new Xen 
version. So the guest may start on an older capacity.

I can assure you that the owner will contact the customer service of the 
cloud provider to ask why the feature is not available on the new guest.

With a knob available, a cloud provider has more flexibility to when the 
feature can be exposed.

>> FAOD, I am sure there might be other features that need to be disabled.
>> But we have to start somewhere :).
> 
> If there is such a need, then yes, sure. But shouldn't we at least
> gain rough agreement on how the future is going to look like with
> this? IOW have in hands some at least roughly agreed criteria by
> which we could decide which new ABI additions will need some form
> of override going forward (also allowing to judge which prior
> additions may want to gain overrides in a retroactive fashion, or
> in fact should have had ones from the beginning)?

I think the answer is quite straight-forward: Anything exposed to the 
non-privileged (I include stubdomain) guest should have a knob to 
disable it.

> 
>>>> Now imagine you are the cloud provider, running Xen. What you did was start to upgrade your hosts from an older version of Xen to a newer version of Xen, to pick up various bug fixes and make sure you are running a version that is within the security support envelope. You identify that your customer's problem is a bug in their OS that was latent on the old version of the hypervisor but is now manifesting on the new one because it has buggy support for a hypercall that was added between the two versions. How are you going to fix this issue, and get your customer up and running again? Of course you'd like your customer to upgrade their OS, but they can't even boot it to do that. You really need a solution that can restore the old VM environment, at least temporarily, for that customer.
>>>
>>> Boot the guest on a not-yet-upgraded host again, to update its kernel?
>>
>> You are making the assumption that the customer would have the choice to
>> target a specific versions of Xen. This may be undesirable for a cloud
>> provider as suddenly your customer may want to stick on the old version
>> of Xen.
> 
> I've gone from you saying "You really need a solution that can restore
> the old VM environment, at least temporarily, for that customer." The
> "temporarily" to me implies that it is at least an option to tie a
> certain guest to a certain Xen version for in-guest upgrading purposes.
 >
> If the deal with the customer doesn't include running on a certain Xen
> version, I don't see how this could have non-temporary consequences to
> the cloud provider.

I think by "you", you mean Paul and not me?

Cheers,
Julien Grall Dec. 7, 2020, 10:07 a.m. UTC | #18
On 07/12/2020 10:04, Julien Grall wrote:
> Hi Jan,
> 
> On 07/12/2020 09:17, Jan Beulich wrote:
>> On 04.12.2020 12:45, Julien Grall wrote:
>>>      1) From the recent security issues (such as XSA-343), a knob to
>>> disable FIFO would be quite beneficials for vendors that don't need the
>>> feature.
>>
>> Except that this wouldn't have been suitable as a during-embargo
>> mitigation, for its observability by guests.
> 
> I think you are missing my point here. If that knob was available before 
> the security event, a vendor may have already decided to use it to 
> disable feature affected.
> 
> This vendor would not have needed to spend time to either hotpatch or 
> reboot all its fleet.
> 
>>>      2) Fleet management purpose. You may have a fleet with multiple
>>> versions of Xen. You don't want your customer to start relying on
>>> features that may not be available on all the hosts otherwise it
>>> complicates the guest placement.
>>
>> Guests incapable to run on older Xen are a problem in this regard
>> anyway, aren't they? And if they are capable, I don't see what
>> you're referring to.
> 
> It is not about guest that cannot run on older Xen. It is more about 
> allowing a guest to use a feature that is not yet widely available in 
> the fleet (you don't update all the hosts in a night...).
> 
> Imagine the guest owner really wants to use feature A that is only 
> available on new Xen version. The owner may have noticed the feature on 
> an existing running guest and would like to create a new guest that use 
> the feature.
> 
> It might be possible that there are no capacity available on the new Xen 
> version. So the guest may start on an older capacity.
> 
> I can assure you that the owner will contact the customer service of the 
> cloud provider to ask why the feature is not available on the new guest.
> 
> With a knob available, a cloud provider has more flexibility to when the 
> feature can be exposed.
> 
>>> FAOD, I am sure there might be other features that need to be disabled.
>>> But we have to start somewhere :).
>>
>> If there is such a need, then yes, sure. But shouldn't we at least
>> gain rough agreement on how the future is going to look like with
>> this? IOW have in hands some at least roughly agreed criteria by
>> which we could decide which new ABI additions will need some form
>> of override going forward (also allowing to judge which prior
>> additions may want to gain overrides in a retroactive fashion, or
>> in fact should have had ones from the beginning)?
> 
> I think the answer is quite straight-forward: Anything exposed to the 
> non-privileged (I include stubdomain) guest should have a knob to 

s/include/exclude/

> disable it.
> 
>>
>>>>> Now imagine you are the cloud provider, running Xen. What you did 
>>>>> was start to upgrade your hosts from an older version of Xen to a 
>>>>> newer version of Xen, to pick up various bug fixes and make sure 
>>>>> you are running a version that is within the security support 
>>>>> envelope. You identify that your customer's problem is a bug in 
>>>>> their OS that was latent on the old version of the hypervisor but 
>>>>> is now manifesting on the new one because it has buggy support for 
>>>>> a hypercall that was added between the two versions. How are you 
>>>>> going to fix this issue, and get your customer up and running 
>>>>> again? Of course you'd like your customer to upgrade their OS, but 
>>>>> they can't even boot it to do that. You really need a solution that 
>>>>> can restore the old VM environment, at least temporarily, for that 
>>>>> customer.
>>>>
>>>> Boot the guest on a not-yet-upgraded host again, to update its kernel?
>>>
>>> You are making the assumption that the customer would have the choice to
>>> target a specific versions of Xen. This may be undesirable for a cloud
>>> provider as suddenly your customer may want to stick on the old version
>>> of Xen.
>>
>> I've gone from you saying "You really need a solution that can restore
>> the old VM environment, at least temporarily, for that customer." The
>> "temporarily" to me implies that it is at least an option to tie a
>> certain guest to a certain Xen version for in-guest upgrading purposes.
>  >
>> If the deal with the customer doesn't include running on a certain Xen
>> version, I don't see how this could have non-temporary consequences to
>> the cloud provider.
> 
> I think by "you", you mean Paul and not me?
> 
> Cheers,
>
Jan Beulich Dec. 7, 2020, 10:15 a.m. UTC | #19
On 07.12.2020 11:04, Julien Grall wrote:
> On 07/12/2020 09:17, Jan Beulich wrote:
>> On 04.12.2020 12:45, Julien Grall wrote:
>>> You are making the assumption that the customer would have the choice to
>>> target a specific versions of Xen. This may be undesirable for a cloud
>>> provider as suddenly your customer may want to stick on the old version
>>> of Xen.
>>
>> I've gone from you saying "You really need a solution that can restore
>> the old VM environment, at least temporarily, for that customer." The
>> "temporarily" to me implies that it is at least an option to tie a
>> certain guest to a certain Xen version for in-guest upgrading purposes.
>  >
>> If the deal with the customer doesn't include running on a certain Xen
>> version, I don't see how this could have non-temporary consequences to
>> the cloud provider.
> 
> I think by "you", you mean Paul and not me?

Oh, right, I didn't pay attention to who wrote that text. Sorry.

Jan
Durrant, Paul Dec. 7, 2020, 10:23 a.m. UTC | #20
> -----Original Message-----
[snip]
> >> I've gone from you saying "You really need a solution that can restore
> >> the old VM environment, at least temporarily, for that customer." The
> >> "temporarily" to me implies that it is at least an option to tie a
> >> certain guest to a certain Xen version for in-guest upgrading purposes.
> >  >

Not necessarily.

> >> If the deal with the customer doesn't include running on a certain Xen
> >> version, I don't see how this could have non-temporary consequences to
> >> the cloud provider.
> >
> > I think by "you", you mean Paul and not me?
> 
> Oh, right, I didn't pay attention to who wrote that text. Sorry.
> 

By temporary I mean that we may want to time-limit turning off a certain part of the ABU because, whilst it is problematic for some customers, it could (and is likely to) have measurable benefits to others. Thus you keep the feature off only until any customers running OS that have problems have upgraded their installations.

  Paul
diff mbox series

Patch

diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 321a13e519b5..3ca9f00d6d83 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -607,6 +607,7 @@  int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
             .max_evtchn_port = b_info->event_channels,
             .max_grant_frames = b_info->max_grant_frames,
             .max_maptrack_frames = b_info->max_maptrack_frames,
+            .flags = XEN_DOMCTL_CDF_evtchn_fifo,
         };
 
         if (info->type != LIBXL_DOMAIN_TYPE_PV) {
diff --git a/tools/ocaml/libs/xc/xenctrl.ml b/tools/ocaml/libs/xc/xenctrl.ml
index e878699b0a1a..fa5c7b7eb0a2 100644
--- a/tools/ocaml/libs/xc/xenctrl.ml
+++ b/tools/ocaml/libs/xc/xenctrl.ml
@@ -65,6 +65,7 @@  type domain_create_flag =
 	| CDF_XS_DOMAIN
 	| CDF_IOMMU
 	| CDF_NESTED_VIRT
+	| CDF_EVTCHN_FIFO
 
 type domain_create_iommu_opts =
 	| IOMMU_NO_SHAREPT
diff --git a/tools/ocaml/libs/xc/xenctrl.mli b/tools/ocaml/libs/xc/xenctrl.mli
index e64907df8e7e..a872002d90cc 100644
--- a/tools/ocaml/libs/xc/xenctrl.mli
+++ b/tools/ocaml/libs/xc/xenctrl.mli
@@ -58,6 +58,7 @@  type domain_create_flag =
   | CDF_XS_DOMAIN
   | CDF_IOMMU
   | CDF_NESTED_VIRT
+  | CDF_EVTCHN_FIFO
 
 type domain_create_iommu_opts =
   | IOMMU_NO_SHAREPT
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 18cafcdda7b1..59f947370053 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -622,7 +622,8 @@  int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
     unsigned int max_vcpus;
 
     /* HVM and HAP must be set. IOMMU may or may not be */
-    if ( (config->flags & ~XEN_DOMCTL_CDF_iommu) !=
+    if ( (config->flags &
+          ~(XEN_DOMCTL_CDF_iommu | XEN_DOMCTL_CDF_evtchn_fifo) !=
          (XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap) )
     {
         dprintk(XENLOG_INFO, "Unsupported configuration %#x\n",
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index e824ba34b012..13d1e79f1463 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2478,7 +2478,8 @@  void __init create_domUs(void)
         struct domain *d;
         struct xen_domctl_createdomain d_cfg = {
             .arch.gic_version = XEN_DOMCTL_CONFIG_GIC_NATIVE,
-            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
+            .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
+                     XEN_DOMCTL_CDF_evtchn_fifo,
             .max_evtchn_port = -1,
             .max_grant_frames = 64,
             .max_maptrack_frames = 1024,
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 7fcff9af2a7e..0267acfca16e 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -805,7 +805,8 @@  void __init start_xen(unsigned long boot_phys_offset,
     struct bootmodule *xen_bootmodule;
     struct domain *dom0;
     struct xen_domctl_createdomain dom0_cfg = {
-        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap,
+        .flags = XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
+                 XEN_DOMCTL_CDF_evtchn_fifo,
         .max_evtchn_port = -1,
         .max_grant_frames = gnttab_dom0_frames(),
         .max_maptrack_frames = -1,
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 30d6f375a3af..e558241c73da 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -738,7 +738,8 @@  static struct domain *__init create_dom0(const module_t *image,
                                          const char *loader)
 {
     struct xen_domctl_createdomain dom0_cfg = {
-        .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
+        .flags = XEN_DOMCTL_CDF_evtchn_fifo |
+                 (IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0),
         .max_evtchn_port = -1,
         .max_grant_frames = -1,
         .max_maptrack_frames = -1,
diff --git a/xen/common/domain.c b/xen/common/domain.c
index f748806a450b..28592c7c8486 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -307,7 +307,7 @@  static int sanitise_domain_config(struct xen_domctl_createdomain *config)
          ~(XEN_DOMCTL_CDF_hvm | XEN_DOMCTL_CDF_hap |
            XEN_DOMCTL_CDF_s3_integrity | XEN_DOMCTL_CDF_oos_off |
            XEN_DOMCTL_CDF_xs_domain | XEN_DOMCTL_CDF_iommu |
-           XEN_DOMCTL_CDF_nested_virt) )
+           XEN_DOMCTL_CDF_nested_virt | XEN_DOMCTL_CDF_evtchn_fifo) )
     {
         dprintk(XENLOG_INFO, "Unknown CDF flags %#x\n", config->flags);
         return -EINVAL;
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index dbfba62a4934..91133bf3c263 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1188,10 +1188,27 @@  static long evtchn_set_priority(const struct evtchn_set_priority *set_priority)
     return ret;
 }
 
+static bool is_fifo_op(int cmd)
+{
+    switch ( cmd )
+    {
+    case EVTCHNOP_init_control:
+    case EVTCHNOP_expand_array:
+    case EVTCHNOP_set_priority:
+        return true;
+    default:
+        return false;
+    }
+}
+
 long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
     long rc;
 
+    if ( !(current->domain->options & XEN_DOMCTL_CDF_evtchn_fifo) &&
+         is_fifo_op(cmd) )
+        return -ENOSYS;
+
     switch ( cmd )
     {
     case EVTCHNOP_alloc_unbound: {
@@ -1568,9 +1585,10 @@  static void domain_dump_evtchn_info(struct domain *d)
     unsigned int port;
     int irq;
 
-    printk("Event channel information for domain %d:\n"
-           "Polling vCPUs: {%*pbl}\n"
-           "    port [p/m/s]\n", d->domain_id, d->max_vcpus, d->poll_mask);
+    printk("Event channel information for %pd:\n", d);
+    printk("ABI: %s\n", d->evtchn_fifo ? "FIFO" : "2-level");
+    printk("Polling vCPUs: {%*pbl}\n"
+           "    port [p/m/s]\n", d->max_vcpus, d->poll_mask);
 
     spin_lock(&d->event_lock);
 
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 666aeb71bf1b..f7149c81a7c2 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -70,9 +70,11 @@  struct xen_domctl_createdomain {
 #define XEN_DOMCTL_CDF_iommu          (1U<<_XEN_DOMCTL_CDF_iommu)
 #define _XEN_DOMCTL_CDF_nested_virt   6
 #define XEN_DOMCTL_CDF_nested_virt    (1U << _XEN_DOMCTL_CDF_nested_virt)
+#define _XEN_DOMCTL_CDF_evtchn_fifo   7
+#define XEN_DOMCTL_CDF_evtchn_fifo    (1U << _XEN_DOMCTL_CDF_evtchn_fifo)
 
 /* Max XEN_DOMCTL_CDF_* constant.  Used for ABI checking. */
-#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_nested_virt
+#define XEN_DOMCTL_CDF_MAX XEN_DOMCTL_CDF_evtchn_fifo
 
     uint32_t flags;