diff mbox series

[v2] evtchn: make support for different ABIs tunable

Message ID 20190807174223.67590-1-elnikety@amazon.com (mailing list archive)
State New, archived
Headers show
Series [v2] evtchn: make support for different ABIs tunable | expand

Commit Message

Eslam Elnikety Aug. 7, 2019, 5:42 p.m. UTC
Support for FIFO event channel ABI was first introduced in Xen 4.4
(see 88910061ec6). Make this support tunable, since the choice of which
event channel ABI has implications for hibernation. Consider resuming a
pre Xen 4.4 hibernated Linux guest. During resume from hibernation, there
are two kernels involved: the "boot" kernel and the "resume" kernel. The
guest boot kernel defaults to FIFO ABI and instructs Xen via an
EVTCHNOP_init_control to switch from 2L to FIFO. On the other hand, the
resume kernel keeps assuming 2L. This, in turn, results in Xen and the
resumed kernel talking past each other (due to different protocols FIFO
vs 2L). A knob to control the support level for event channel ABIs per
guest would allow an administrator to work around this issue. This patch
introduces this knob.

In order to announce to guests that the event channel ABI does not support
FIFO, the hypervisor returns EPERM on init_control operation. When this
operation fails, the guest should continue to use the 2L event channel ABI.
For example, in Linux drivers/xen/events/events_base.c:

    if (fifo_events)
        ret = xen_evtchn_fifo_init();
    if (ret < 0)
        xen_evtchn_2l_init();

and xen_evtchn_fifo_init fails when EVTCHNOP_init_control fails. This commit
does not change the current default behaviour: announce FIFO event channels
ABI support for guests unless explicitly stated otherwise at domaincreate.

Signed-off-by: Eslam Elnikety <elnikety@amazon.com>
---
    Changes in v2:
        - Reworded the commit message
        - Return EPERM instead of ENOSYS when forcing 2L ABI
        - Use d->options instead of replicating the flag
---
 docs/man/xl.cfg.5.pod.in    | 5 +++++
 tools/libxl/libxl.h         | 8 ++++++++
 tools/libxl/libxl_create.c  | 5 +++++
 tools/libxl/libxl_types.idl | 1 +
 tools/xl/xl_parse.c         | 2 ++
 tools/xl/xl_sxp.c           | 2 ++
 xen/common/event_channel.c  | 5 +++++
 xen/include/public/domctl.h | 3 +++
 8 files changed, 31 insertions(+)

Comments

Jan Beulich Aug. 8, 2019, 1:27 p.m. UTC | #1
On 07.08.2019 19:42, Eslam Elnikety wrote:
> Support for FIFO event channel ABI was first introduced in Xen 4.4
> (see 88910061ec6). Make this support tunable, since the choice of which
> event channel ABI has implications for hibernation. Consider resuming a
> pre Xen 4.4 hibernated Linux guest. During resume from hibernation, there
> are two kernels involved: the "boot" kernel and the "resume" kernel. The
> guest boot kernel defaults to FIFO ABI and instructs Xen via an
> EVTCHNOP_init_control to switch from 2L to FIFO.

This should only be "may default to" - there's nothing making this
to be the case unconditionally afaict. Another option would be to
start the sentence "If the guest boot kernel defaults ...".

> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -1170,6 +1170,11 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>   
>       case EVTCHNOP_init_control: {
>           struct evtchn_init_control init_control;
> +
> +        /* Fail init_control for domains that must use 2l ABI */
> +        if ( current->domain->options & XEN_DOMCTL_CDF_disable_fifo )
> +            return -EPERM;
> +
>           if ( copy_from_guest(&init_control, arg, 1) != 0 )
>               return -EFAULT;
>           rc = evtchn_fifo_init_control(&init_control);

I think the check would better go into evtchn_fifo_init_control(),
at least as long as the setting really is FIFO-centric. Also the
comment will become stale the moment a 3rd evtchn mechanism appears
- it shouldn't mention 2L as the setting isn't "2-level only". Then
again you may actually want it to behave like this, and hence be
named accordingly.

Irrespective of these remarks, and along the lines of comments on
the v1 thread, introducing wider control that would also allow
disabling 2-level (for HVM guests) would seem better to me. This
would then hopefully be coded up in a way that would make extending
it straightforward, once a 3rd mechanism appears.

Jan
Eslam Elnikety Aug. 8, 2019, 2:31 p.m. UTC | #2
Hi Jan,

On 8. Aug 2019, at 15:27, Jan Beulich <JBeulich@suse.com<mailto:JBeulich@suse.com>> wrote:

On 07.08.2019 19:42, Eslam Elnikety wrote:
Support for FIFO event channel ABI was first introduced in Xen 4.4
(see 88910061ec6). Make this support tunable, since the choice of which
event channel ABI has implications for hibernation. Consider resuming a
pre Xen 4.4 hibernated Linux guest. During resume from hibernation, there
are two kernels involved: the "boot" kernel and the "resume" kernel. The
guest boot kernel defaults to FIFO ABI and instructs Xen via an
EVTCHNOP_init_control to switch from 2L to FIFO.

This should only be "may default to" - there's nothing making this
to be the case unconditionally afaict. Another option would be to
start the sentence "If the guest boot kernel defaults …".

Correct. v3 will use “If the guest book kernel defaults … “.


--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1170,6 +1170,11 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)

     case EVTCHNOP_init_control: {
         struct evtchn_init_control init_control;
+
+        /* Fail init_control for domains that must use 2l ABI */
+        if ( current->domain->options & XEN_DOMCTL_CDF_disable_fifo )
+            return -EPERM;
+
         if ( copy_from_guest(&init_control, arg, 1) != 0 )
             return -EFAULT;
         rc = evtchn_fifo_init_control(&init_control);

I think the check would better go into evtchn_fifo_init_control(),
at least as long as the setting really is FIFO-centric. Also the

Not sure. It is the FIFO ABI that defines EVTCHNOP_init_control (not defined in 2L). Short-circuiting the hypercall at this place seems more appropriate. Moreover, doing copy_from_guest and calling into evtchn_fifo_init_control only to return error is not optimal.

comment will become stale the moment a 3rd evtchn mechanism appears
- it shouldn't mention 2L as the setting isn't "2-level only". Then
again you may actually want it to behave like this, and hence be
named accordingly.

Correct. Will reword the comment to: /* Fail init_control for domains that cannot use FIFO ABI */


Irrespective of these remarks, and along the lines of comments on
the v1 thread, introducing wider control that would also allow
disabling 2-level (for HVM guests) would seem better to me. This
would then hopefully be coded up in a way that would make extending
it straightforward, once a 3rd mechanism appears.

Hmmm... we cannot force guests to call init_control (in order to flip from 2L to FIFO). Quoting from [1] 4.4 “If this call (EVTCHNOP_init_control) fails on the boot VCPU, the guest should continue to use the 2-level event channel ABI for all VCPUs.” Support for 2L ABI does not sound like something that can be optional. Once a 3rd mechanism appears, I imagine adding a corresponding domctl flag to control such mechanism.

[1] https://xenbits.xenproject.org/people/dvrabel/event-channels-F.pdf


Jan
Jan Beulich Aug. 8, 2019, 3:12 p.m. UTC | #3
On 08.08.2019 16:31,  Elnikety, Eslam  wrote:

First of all - can you please try to get your reply quoting
improved, such that readers can actually tell your replies
from context? (I didn't check whether perhaps your mail was
a HTML one, and my plain text reading of it discarded the
markings. If so - please don't send HTML mail.)

> On 8. Aug 2019, at 15:27, Jan Beulich <JBeulich@suse.com<mailto:JBeulich@suse.com>> wrote:
> On 07.08.2019 19:42, Eslam Elnikety wrote:
> --- a/xen/common/event_channel.c
> +++ b/xen/common/event_channel.c
> @@ -1170,6 +1170,11 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> 
>       case EVTCHNOP_init_control: {
>           struct evtchn_init_control init_control;
> +
> +        /* Fail init_control for domains that must use 2l ABI */
> +        if ( current->domain->options & XEN_DOMCTL_CDF_disable_fifo )
> +            return -EPERM;
> +
>           if ( copy_from_guest(&init_control, arg, 1) != 0 )
>               return -EFAULT;
>           rc = evtchn_fifo_init_control(&init_control);
> 
> I think the check would better go into evtchn_fifo_init_control(),
> at least as long as the setting really is FIFO-centric. Also the
> 
> Not sure. It is the FIFO ABI that defines EVTCHNOP_init_control
> (not defined in 2L). Short-circuiting the hypercall at this place
> seems more appropriate.

I'd expect any 3rd variant to also have a need for an init-control
operation, and hence at that point this would become a hook like
many of the other ops. And at that point the check would need to
move anyway. Hence it's better to put it in its long term
designated place right away.

> Moreover, doing copy_from_guest and calling into
> evtchn_fifo_init_control only to return error is not optimal.

True, yet from my pov the more logical alternative code structure
is still preferable.

> Irrespective of these remarks, and along the lines of comments on
> the v1 thread, introducing wider control that would also allow
> disabling 2-level (for HVM guests) would seem better to me. This
> would then hopefully be coded up in a way that would make extending
> it straightforward, once a 3rd mechanism appears.
> 
> Hmmm... we cannot force guests to call init_control (in order to flip from 2L to FIFO). Quoting from [1] 4.4 “If this call (EVTCHNOP_init_control) fails on the boot VCPU, the guest should continue to use the 2-level event channel ABI for all VCPUs.” Support for 2L ABI does not sound like something that can be optional. Once a 3rd mechanism appears, I imagine adding a corresponding domctl flag to control such mechanism.

For HVM, event channels as a whole should be optional; we shouldn't
default a possible control for this to off though. For PV, the
2-level interface indeed has to be considered mandatory. Hence
today there are these (theoretically) possible combinations

		PV		HVM
nothing		invalid		valid
2-level only	valid		valid
FIFO only	???		valid
both		valid		valid

Jan
Eslam Elnikety Aug. 8, 2019, 6:54 p.m. UTC | #4
Hi Jan,

On 08.08.19 17:12, Jan Beulich wrote:
> On 08.08.2019 16:31,  Elnikety, Eslam  wrote:
> 
> First of all - can you please try to get your reply quoting
> improved, such that readers can actually tell your replies
> from context? (I didn't check whether perhaps your mail was
> a HTML one, and my plain text reading of it discarded the
> markings. If so - please don't send HTML mail.)

Oopsy. It was HTML. I will be more diligent going forward :)

> 
>> On 8. Aug 2019, at 15:27, Jan Beulich <JBeulich@suse.com<mailto:JBeulich@suse.com>> wrote:
>> On 07.08.2019 19:42, Eslam Elnikety wrote:
>> --- a/xen/common/event_channel.c
>> +++ b/xen/common/event_channel.c
>> @@ -1170,6 +1170,11 @@ long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>
>>        case EVTCHNOP_init_control: {
>>            struct evtchn_init_control init_control;
>> +
>> +        /* Fail init_control for domains that must use 2l ABI */
>> +        if ( current->domain->options & XEN_DOMCTL_CDF_disable_fifo )
>> +            return -EPERM;
>> +
>>            if ( copy_from_guest(&init_control, arg, 1) != 0 )
>>                return -EFAULT;
>>            rc = evtchn_fifo_init_control(&init_control);
>>
>> I think the check would better go into evtchn_fifo_init_control(),
>> at least as long as the setting really is FIFO-centric. Also the
>>
>> Not sure. It is the FIFO ABI that defines EVTCHNOP_init_control
>> (not defined in 2L). Short-circuiting the hypercall at this place
>> seems more appropriate.
> 
> I'd expect any 3rd variant to also have a need for an init-control
> operation, and hence at that point this would become a hook like
> many of the other ops. And at that point the check would need to
> move anyway. Hence it's better to put it in its long term
> designated place right away.
> 
>> Moreover, doing copy_from_guest and calling into
>> evtchn_fifo_init_control only to return error is not optimal.
> 
> True, yet from my pov the more logical alternative code structure
> is still preferable.

Fair point. On a second thought, the additional cycles are not a problem 
(given that init_control is not on a critical __must be performant__ 
path). I am now also in favor of moving the check to 
event_fifo_init_control.

> 
>> Irrespective of these remarks, and along the lines of comments on
>> the v1 thread, introducing wider control that would also allow
>> disabling 2-level (for HVM guests) would seem better to me. This
>> would then hopefully be coded up in a way that would make extending
>> it straightforward, once a 3rd mechanism appears.
>>
>> Hmmm... we cannot force guests to call init_control (in order to flip from 2L to FIFO). Quoting from [1] 4.4 “If this call (EVTCHNOP_init_control) fails on the boot VCPU, the guest should continue to use the 2-level event channel ABI for all VCPUs.” Support for 2L ABI does not sound like something that can be optional. Once a 3rd mechanism appears, I imagine adding a corresponding domctl flag to control such mechanism.
> 
> For HVM, event channels as a whole should be optional; we shouldn't
> default a possible control for this to off though. For PV, the
> 2-level interface indeed has to be considered mandatory. Hence
> today there are these (theoretically) possible combinations
> 
> 		PV		HVM
> nothing		invalid		valid
> 2-level only	valid		valid

The patch at hand here is concerned only with the above line. (In fact, 
v3 will rename the subject to: "evtchn: introduce a knob to on/off FIFO 
ABI per guest".)

I take it that the concern here is that whatever changes the patch 
proposes should play nicely with potential future changes introducing 
such a generic framework. Any concrete suggestions?

Thanks,
Eslam

> FIFO only	???		valid
> both		valid		valid



> 
> Jan
>
Jan Beulich Aug. 9, 2019, 7:12 a.m. UTC | #5
On 08.08.2019 20:54, Eslam Elnikety wrote:
> I take it that the concern here is that whatever changes the patch
> proposes should play nicely with potential future changes introducing
> such a generic framework. Any concrete suggestions?

Well, involved domctl(s) can be changed at any time, so the
main question is whether at the guest config level this wants
to be expressed in a more forward compatible manner. But that's
a question I'd like to defer to people more familiar with the
general principles of how guest config file handling should be
extended.

Jan
diff mbox series

Patch

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index c99d40307e..f204d8b4f0 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -1262,6 +1262,11 @@  FIFO-based event channel ABI support up to 131,071 event channels.
 Other guests are limited to 4095 (64-bit x86 and ARM) or 1023 (32-bit
 x86).
 
+=item B<disable_evtchn_fifo=BOOLEAN>
+
+Indicates if support for FIFO event channel ABI is disabled. The default
+is false (0).
+
 =item B<vdispl=[ "VDISPL_SPEC_STRING", "VDISPL_SPEC_STRING", ...]>
 
 Specifies the virtual display devices to be provided to the guest.
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 9bacfb97f0..9ee80d74a6 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -169,6 +169,14 @@ 
  */
 #define LIBXL_HAVE_BUILDINFO_EVENT_CHANNELS 1
 
+/*
+ * LIBXL_HAVE_BUILDINFO_DISABLE_EVTCHN_FIFO indicates that the
+ * libxl_domain_build_info structure contains a boolean
+ * disable_evtchn_fifo which instructs libxl to enable/disable
+ * support for FIFO event channel ABI at create time.
+ */
+#define LIBXL_HAVE_BUILDINFO_DISABLE_EVTCHN_FIFO 1
+
 /*
  * libxl_domain_build_info has the u.hvm.ms_vm_genid field.
  */
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 03ce166f4f..aa87e45643 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -217,6 +217,8 @@  int libxl__domain_build_info_setdefault(libxl__gc *gc,
     if (!b_info->event_channels)
         b_info->event_channels = 1023;
 
+    libxl_defbool_setdefault(&b_info->disable_evtchn_fifo, false);
+
     libxl__arch_domain_build_info_setdefault(gc, b_info);
     libxl_defbool_setdefault(&b_info->dm_restrict, false);
 
@@ -564,6 +566,9 @@  int libxl__domain_make(libxl__gc *gc, libxl_domain_config *d_config,
                 libxl_defbool_val(info->oos) ? 0 : XEN_DOMCTL_CDF_oos_off;
         }
 
+        if (libxl_defbool_val(b_info->disable_evtchn_fifo))
+            create.flags |= XEN_DOMCTL_CDF_disable_fifo;
+
         /* Ultimately, handle is an array of 16 uint8_t, same as uuid */
         libxl_uuid_copy(ctx, (libxl_uuid *)&create.handle, &info->uuid);
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index b61399ce36..5f30570443 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -521,6 +521,7 @@  libxl_domain_build_info = Struct("domain_build_info",[
     ("iomem",            Array(libxl_iomem_range, "num_iomem")),
     ("claim_mode",	     libxl_defbool),
     ("event_channels",   uint32),
+    ("disable_evtchn_fifo", libxl_defbool),
     ("kernel",           string),
     ("cmdline",          string),
     ("ramdisk",          string),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index e105bda2bb..bcf16c31d4 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1496,6 +1496,8 @@  void parse_config_data(const char *config_source,
     if (!xlu_cfg_get_long(config, "max_event_channels", &l, 0))
         b_info->event_channels = l;
 
+    xlu_cfg_get_defbool(config, "disable_evtchn_fifo",
+                        &b_info->disable_evtchn_fifo, 0);
     xlu_cfg_replace_string (config, "kernel", &b_info->kernel, 0);
     xlu_cfg_replace_string (config, "ramdisk", &b_info->ramdisk, 0);
     xlu_cfg_replace_string (config, "device_tree", &b_info->device_tree, 0);
diff --git a/tools/xl/xl_sxp.c b/tools/xl/xl_sxp.c
index 359a001570..52e98c6c61 100644
--- a/tools/xl/xl_sxp.c
+++ b/tools/xl/xl_sxp.c
@@ -71,6 +71,8 @@  void printf_info_sexp(int domid, libxl_domain_config *d_config, FILE *fh)
     fprintf(fh, "\t(target_memkb %"PRId64")\n", b_info->target_memkb);
     fprintf(fh, "\t(nomigrate %s)\n",
            libxl_defbool_to_string(b_info->disable_migrate));
+    fprintf(fh, "\t(disable_evtchn_fifo %s)\n",
+           libxl_defbool_to_string(b_info->disable_evtchn_fifo));
 
     if (c_info->type == LIBXL_DOMAIN_TYPE_PV && b_info->bootloader) {
         fprintf(fh, "\t(bootloader %s)\n", b_info->bootloader);
diff --git a/xen/common/event_channel.c b/xen/common/event_channel.c
index e86e2bfab0..0df58dee53 100644
--- a/xen/common/event_channel.c
+++ b/xen/common/event_channel.c
@@ -1170,6 +1170,11 @@  long do_event_channel_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 
     case EVTCHNOP_init_control: {
         struct evtchn_init_control init_control;
+
+        /* Fail init_control for domains that must use 2l ABI */
+        if ( current->domain->options & XEN_DOMCTL_CDF_disable_fifo )
+            return -EPERM;
+
         if ( copy_from_guest(&init_control, arg, 1) != 0 )
             return -EFAULT;
         rc = evtchn_fifo_init_control(&init_control);
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 19486d5e32..654b4fdd22 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -64,6 +64,9 @@  struct xen_domctl_createdomain {
  /* Is this a xenstore domain? */
 #define _XEN_DOMCTL_CDF_xs_domain     4
 #define XEN_DOMCTL_CDF_xs_domain      (1U<<_XEN_DOMCTL_CDF_xs_domain)
+ /* Disable FIFO event channels? */
+#define _XEN_DOMCTL_CDF_disable_fifo  5
+#define XEN_DOMCTL_CDF_disable_fifo   (1U<<_XEN_DOMCTL_CDF_disable_fifo)
     uint32_t flags;
 
     /*