diff mbox series

[v14,21/23] x86/virt/tdx: Handle TDX interaction with ACPI S3 and deeper states

Message ID 7daec6d20bf93c2ff87268866d112ee8efd44e01.1697532085.git.kai.huang@intel.com (mailing list archive)
State New, archived
Headers show
Series TDX host kernel support | expand

Commit Message

Huang, Kai Oct. 17, 2023, 10:14 a.m. UTC
TDX cannot survive from S3 and deeper states.  The hardware resets and
disables TDX completely when platform goes to S3 and deeper.  Both TDX
guests and the TDX module get destroyed permanently.

The kernel uses S3 to support suspend-to-ram, and S4 or deeper states to
support hibernation.  The kernel also maintains TDX states to track
whether it has been initialized and its metadata resource, etc.  After
resuming from S3 or hibernation, these TDX states won't be correct
anymore.

Theoretically, the kernel can do more complicated things like resetting
TDX internal states and TDX module metadata before going to S3 or
deeper, and re-initialize TDX module after resuming, etc, but there is
no way to save/restore TDX guests for now.

Until TDX supports full save and restore of TDX guests, there is no big
value to handle TDX module in suspend and hibernation alone.  To make
things simple, just choose to make TDX mutually exclusive with S3 and
hibernation.

Note the TDX module is initialized at runtime.  To avoid having to deal
with the fuss of determining TDX state at runtime, just choose TDX vs S3
and hibernation at kernel early boot.  It's a bad user experience if the
choice of TDX and S3/hibernation is done at runtime anyway, i.e., the
user can experience being able to do S3/hibernation but later becoming
unable to due to TDX being enabled.

Disable TDX in kernel early boot when hibernation is available, and give
a message telling the user to disable hibernation via kernel command
line in order to use TDX.  Currently there's no mechanism exposed by the
hibernation code to allow other kernel code to disable hibernation once
for all.

Disable ACPI S3 by setting acpi_suspend_lowlevel function pointer to
NULL when TDX is enabled by the BIOS.  This avoids having to modify the
ACPI code to disable ACPI S3 in other ways.

Also give a message telling the user to disable TDX in the BIOS in order
to use ACPI S3.  A new kernel command line can be added in the future if
there's a need to let user disable TDX host via kernel command line.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---

v13 -> v14:
 - New patch

---
 arch/x86/virt/vmx/tdx/tdx.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Comments

Rafael J. Wysocki Oct. 17, 2023, 10:53 a.m. UTC | #1
On Tue, Oct 17, 2023 at 12:17 PM Kai Huang <kai.huang@intel.com> wrote:
>
> TDX cannot survive from S3 and deeper states.  The hardware resets and
> disables TDX completely when platform goes to S3 and deeper.  Both TDX
> guests and the TDX module get destroyed permanently.
>
> The kernel uses S3 to support suspend-to-ram, and S4 or deeper states to
> support hibernation.  The kernel also maintains TDX states to track
> whether it has been initialized and its metadata resource, etc.  After
> resuming from S3 or hibernation, these TDX states won't be correct
> anymore.
>
> Theoretically, the kernel can do more complicated things like resetting
> TDX internal states and TDX module metadata before going to S3 or
> deeper, and re-initialize TDX module after resuming, etc, but there is
> no way to save/restore TDX guests for now.
>
> Until TDX supports full save and restore of TDX guests, there is no big
> value to handle TDX module in suspend and hibernation alone.  To make
> things simple, just choose to make TDX mutually exclusive with S3 and
> hibernation.
>
> Note the TDX module is initialized at runtime.  To avoid having to deal
> with the fuss of determining TDX state at runtime, just choose TDX vs S3
> and hibernation at kernel early boot.  It's a bad user experience if the
> choice of TDX and S3/hibernation is done at runtime anyway, i.e., the
> user can experience being able to do S3/hibernation but later becoming
> unable to due to TDX being enabled.
>
> Disable TDX in kernel early boot when hibernation is available, and give
> a message telling the user to disable hibernation via kernel command
> line in order to use TDX.  Currently there's no mechanism exposed by the
> hibernation code to allow other kernel code to disable hibernation once
> for all.
>
> Disable ACPI S3 by setting acpi_suspend_lowlevel function pointer to
> NULL when TDX is enabled by the BIOS.  This avoids having to modify the
> ACPI code to disable ACPI S3 in other ways.
>
> Also give a message telling the user to disable TDX in the BIOS in order
> to use ACPI S3.  A new kernel command line can be added in the future if
> there's a need to let user disable TDX host via kernel command line.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>
> v13 -> v14:
>  - New patch
>
> ---
>  arch/x86/virt/vmx/tdx/tdx.c | 23 +++++++++++++++++++++++
>  1 file changed, 23 insertions(+)
>
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index e82f0adeea4d..1d0f1045dd33 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -28,10 +28,12 @@
>  #include <linux/sort.h>
>  #include <linux/log2.h>
>  #include <linux/reboot.h>
> +#include <linux/suspend.h>
>  #include <asm/msr-index.h>
>  #include <asm/msr.h>
>  #include <asm/page.h>
>  #include <asm/special_insns.h>
> +#include <asm/acpi.h>
>  #include <asm/tdx.h>
>  #include "tdx.h"
>
> @@ -1427,6 +1429,22 @@ static int __init tdx_init(void)
>                 return -ENODEV;
>         }
>
> +#define HIBERNATION_MSG                \
> +       "Disable TDX due to hibernation is available. Use 'nohibernate' command line to disable hibernation."

I'm not sure if this new symbol is really necessary.

The message could be as simple as "Initialization failed: Hibernation
support is enabled" (assuming a properly defined pr_fmt()), because
that carries enough information about the reason for the failure IMO.

How to address it can be documented elsewhere.

> +       /*
> +        * Note hibernation_available() can vary when it is called at
> +        * runtime as it checks secretmem_active() and cxl_mem_active()
> +        * which can both vary at runtime.  But here at early_init() they
> +        * both cannot return true, thus when hibernation_available()
> +        * returns false here, hibernation is disabled by either
> +        * 'nohibernate' or LOCKDOWN_HIBERNATION security lockdown,
> +        * which are both permanent.
> +        */

IIUC, the role of the comment is to document the fact that it is OK to
use hiberation_available() here, because it cannot return "false"
intermittently at this point, so I would just say "At this point,
hibernation_available() indicates whether or not hibernation support
has been permanently disabled", without going into all of the details
(which are irrelevant IMO and may change in the future).

> +       if (hibernation_available()) {
> +               pr_err("initialization failed: %s\n", HIBERNATION_MSG);
> +               return -ENODEV;
> +       }
> +
>         err = register_memory_notifier(&tdx_memory_nb);
>         if (err) {
>                 pr_err("initialization failed: register_memory_notifier() failed (%d)\n",
> @@ -1442,6 +1460,11 @@ static int __init tdx_init(void)
>                 return -ENODEV;
>         }
>
> +#ifdef CONFIG_ACPI
> +       pr_info("Disable ACPI S3 suspend. Turn off TDX in the BIOS to use ACPI S3.\n");
> +       acpi_suspend_lowlevel = NULL;
> +#endif

It would be somewhat nicer to have a helper for setting this pointer.

> +
>         /*
>          * Just use the first TDX KeyID as the 'global KeyID' and
>          * leave the rest for TDX guests.
> --
Huang, Kai Oct. 18, 2023, 3:22 a.m. UTC | #2
Hi Rafael,
Thanks for feedback!
> 


> > @@ -1427,6 +1429,22 @@ static int __init tdx_init(void)
> >                 return -ENODEV;
> >         }
> > 
> > +#define HIBERNATION_MSG                \
> > +       "Disable TDX due to hibernation is available. Use 'nohibernate'
command line to disable hibernation."
> 
> I'm not sure if this new symbol is really necessary.
> 
> The message could be as simple as "Initialization failed: Hibernation
> support is enabled" (assuming a properly defined pr_fmt()), because
> that carries enough information about the reason for the failure IMO.
> 
> How to address it can be documented elsewhere.


The last patch of this series is the documentation patch to add TDX host.  We
can add a sentence to suggest the user to use 'nohibernate' kernel command line
when one sees TDX gets disabled because of hibernation being available.

But isn't better to just provide such information together in the dmesg so the
user can immediately know how to resolve this issue? 

If user only sees "... failed: Hibernation support is enabled", then the user
will need additional knowledge to know where to look for the solution first, and
only after that, the user can know how to resolve this.

> 
> > +       /*
> > +        * Note hibernation_available() can vary when it is called at
> > +        * runtime as it checks secretmem_active() and cxl_mem_active()
> > +        * which can both vary at runtime.  But here at early_init() they
> > +        * both cannot return true, thus when hibernation_available()
> > +        * returns false here, hibernation is disabled by either
> > +        * 'nohibernate' or LOCKDOWN_HIBERNATION security lockdown,
> > +        * which are both permanent.
> > +        */
> 
> IIUC, the role of the comment is to document the fact that it is OK to
> use hiberation_available() here, because it cannot return "false"
> intermittently at this point, so I would just say "At this point,
> hibernation_available() indicates whether or not hibernation support
> has been permanently disabled", without going into all of the details
> (which are irrelevant IMO and may change in the future).


Agreed.  Will do.  Thanks.

> 
> > +       if (hibernation_available()) {
> > +               pr_err("initialization failed: %s\n", HIBERNATION_MSG);
> > +               return -ENODEV;
> > +       }
> > +
> >         err = register_memory_notifier(&tdx_memory_nb);
> >         if (err) {
> >                 pr_err("initialization failed: register_memory_notifier()
failed (%d)\n",
> > @@ -1442,6 +1460,11 @@ static int __init tdx_init(void)
> >                 return -ENODEV;
> >         }
> > 
> > +#ifdef CONFIG_ACPI
> > +       pr_info("Disable ACPI S3 suspend. Turn off TDX in the BIOS to use
ACPI S3.\n");
> > +       acpi_suspend_lowlevel = NULL;
> > +#endif
> 
> It would be somewhat nicer to have a helper for setting this pointer.
> 


OK.  Currently Xen PV dom0 also overrides the acpi_suspend_lowlevel.

Do you want the helper introduced now together with this series, or it is
acceptable to have a patch later after TDX gets merged to add a helper and
change both Xen and TDX code to use the helper?

Anyway, I suppose you mean we provide a helper in the ACPI code, and call that
helper here in TDX code.

Just in case you want the helper now, then I think it's better to have two
patches to do below ?

 1) A patch to introduce the helper, and change the Xen code to use it.
 2) The current TDX patch here, but change to use the new helper to set the
    acpi_suspend_lowlevel

I made the incremental diff to cover above based on this patch (see below,
compile tested only).  And the TDX part change will be split out as mentioned
above.

Do you have any comments?

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index c8a7fc23f63c..e71bff60d647 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -60,8 +60,10 @@ static inline void acpi_disable_pci(void)
        acpi_noirq_set();
 }
 
-/* Low-level suspend routine. */
-extern int (*acpi_suspend_lowlevel)(void);
+typedef int (*acpi_suspend_lowlevel_t)(void);
+
+/* Set up low-level suspend routine. */
+void acpi_set_suspend_lowlevel(acpi_suspend_lowlevel_t func);
 
 /* Physical address to resume after wakeup */
 unsigned long acpi_get_wakeup_address(void);
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2a0ea38955df..95be371305c6 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -779,11 +779,17 @@ int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
 void (*__acpi_unregister_gsi)(u32 gsi) = NULL;
 
 #ifdef CONFIG_ACPI_SLEEP
-int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
+static int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
 #else
-int (*acpi_suspend_lowlevel)(void);
+static int (*acpi_suspend_lowlevel)(void);
 #endif
 
+/* To override the default acpi_suspend_lowlevel */
+void acpi_set_suspend_lowlevel(acpi_suspend_lowlevel_t func)
+{
+       acpi_suspend_lowlevel = func;
+}
+
 /*
  * success: return IRQ number (>=0)
  * failure: return < 0
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 38ec6815a42a..c8586bee4650 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1565,7 +1565,7 @@ static int __init tdx_init(void)
 
 #ifdef CONFIG_ACPI
        pr_info("Disable ACPI S3 suspend. Turn off TDX in the BIOS to use ACPI
S3.\n");
-       acpi_suspend_lowlevel = NULL;
+       acpi_set_suspend_lowlevel(NULL);
 #endif
 
        /*
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index b1e11863144d..81a1b6ee8fc2 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -64,7 +64,7 @@ static inline void xen_acpi_sleep_register(void)
                acpi_os_set_prepare_extended_sleep(
                        &xen_acpi_notify_hypervisor_extended_sleep);
 
-               acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
+               acpi_set_suspend_lowlevel(xen_acpi_suspend_lowlevel);
        }
 }
 #else


>
Rafael J. Wysocki Oct. 18, 2023, 10:15 a.m. UTC | #3
On Wed, Oct 18, 2023 at 5:22 AM Huang, Kai <kai.huang@intel.com> wrote:
>
> Hi Rafael,
> Thanks for feedback!
> >
>
>
> > > @@ -1427,6 +1429,22 @@ static int __init tdx_init(void)
> > >                 return -ENODEV;
> > >         }
> > >
> > > +#define HIBERNATION_MSG                \
> > > +       "Disable TDX due to hibernation is available. Use 'nohibernate'
> command line to disable hibernation."
> >
> > I'm not sure if this new symbol is really necessary.
> >
> > The message could be as simple as "Initialization failed: Hibernation
> > support is enabled" (assuming a properly defined pr_fmt()), because
> > that carries enough information about the reason for the failure IMO.
> >
> > How to address it can be documented elsewhere.
>
>
> The last patch of this series is the documentation patch to add TDX host.  We
> can add a sentence to suggest the user to use 'nohibernate' kernel command line
> when one sees TDX gets disabled because of hibernation being available.
>
> But isn't better to just provide such information together in the dmesg so the
> user can immediately know how to resolve this issue?
>
> If user only sees "... failed: Hibernation support is enabled", then the user
> will need additional knowledge to know where to look for the solution first, and
> only after that, the user can know how to resolve this.

I would expect anyone interested in a given feature to get familiar
with its documentation in the first place.  If they neglect to do that
and then find this message, it is absolutely fair to expect them to go
and look into the documentation after all.

> >
> > > +       /*
> > > +        * Note hibernation_available() can vary when it is called at
> > > +        * runtime as it checks secretmem_active() and cxl_mem_active()
> > > +        * which can both vary at runtime.  But here at early_init() they
> > > +        * both cannot return true, thus when hibernation_available()
> > > +        * returns false here, hibernation is disabled by either
> > > +        * 'nohibernate' or LOCKDOWN_HIBERNATION security lockdown,
> > > +        * which are both permanent.
> > > +        */
> >
> > IIUC, the role of the comment is to document the fact that it is OK to
> > use hiberation_available() here, because it cannot return "false"
> > intermittently at this point, so I would just say "At this point,
> > hibernation_available() indicates whether or not hibernation support
> > has been permanently disabled", without going into all of the details
> > (which are irrelevant IMO and may change in the future).
>
>
> Agreed.  Will do.  Thanks.
>
> >
> > > +       if (hibernation_available()) {
> > > +               pr_err("initialization failed: %s\n", HIBERNATION_MSG);
> > > +               return -ENODEV;
> > > +       }
> > > +
> > >         err = register_memory_notifier(&tdx_memory_nb);
> > >         if (err) {
> > >                 pr_err("initialization failed: register_memory_notifier()
> failed (%d)\n",
> > > @@ -1442,6 +1460,11 @@ static int __init tdx_init(void)
> > >                 return -ENODEV;
> > >         }
> > >
> > > +#ifdef CONFIG_ACPI
> > > +       pr_info("Disable ACPI S3 suspend. Turn off TDX in the BIOS to use
> ACPI S3.\n");
> > > +       acpi_suspend_lowlevel = NULL;
> > > +#endif
> >
> > It would be somewhat nicer to have a helper for setting this pointer.
> >
>
>
> OK.  Currently Xen PV dom0 also overrides the acpi_suspend_lowlevel.
>
> Do you want the helper introduced now together with this series, or it is
> acceptable to have a patch later after TDX gets merged to add a helper and
> change both Xen and TDX code to use the helper?
>
> Anyway, I suppose you mean we provide a helper in the ACPI code, and call that
> helper here in TDX code.

Yes, I do.

> Just in case you want the helper now, then I think it's better to have two
> patches to do below ?
>
>  1) A patch to introduce the helper, and change the Xen code to use it.
>  2) The current TDX patch here, but change to use the new helper to set the
>     acpi_suspend_lowlevel

Yes, this makes sense to me.

> I made the incremental diff to cover above based on this patch (see below,
> compile tested only).  And the TDX part change will be split out as mentioned
> above.
>
> Do you have any comments?
>
> diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
> index c8a7fc23f63c..e71bff60d647 100644
> --- a/arch/x86/include/asm/acpi.h
> +++ b/arch/x86/include/asm/acpi.h
> @@ -60,8 +60,10 @@ static inline void acpi_disable_pci(void)
>         acpi_noirq_set();
>  }
>
> -/* Low-level suspend routine. */
> -extern int (*acpi_suspend_lowlevel)(void);
> +typedef int (*acpi_suspend_lowlevel_t)(void);
> +
> +/* Set up low-level suspend routine. */
> +void acpi_set_suspend_lowlevel(acpi_suspend_lowlevel_t func);

I'm not sure about the typededf, but I have no strong opinion against it either.

>
>  /* Physical address to resume after wakeup */
>  unsigned long acpi_get_wakeup_address(void);
> diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> index 2a0ea38955df..95be371305c6 100644
> --- a/arch/x86/kernel/acpi/boot.c
> +++ b/arch/x86/kernel/acpi/boot.c
> @@ -779,11 +779,17 @@ int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
>  void (*__acpi_unregister_gsi)(u32 gsi) = NULL;
>
>  #ifdef CONFIG_ACPI_SLEEP
> -int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
> +static int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
>  #else
> -int (*acpi_suspend_lowlevel)(void);
> +static int (*acpi_suspend_lowlevel)(void);

For the sake of consistency, either use the typedef here, or don't use
it at all.

>  #endif
>
> +/* To override the default acpi_suspend_lowlevel */
> +void acpi_set_suspend_lowlevel(acpi_suspend_lowlevel_t func)
> +{
> +       acpi_suspend_lowlevel = func;
> +}
> +
>  /*
>   * success: return IRQ number (>=0)
>   * failure: return < 0
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index 38ec6815a42a..c8586bee4650 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1565,7 +1565,7 @@ static int __init tdx_init(void)
>
>  #ifdef CONFIG_ACPI
>         pr_info("Disable ACPI S3 suspend. Turn off TDX in the BIOS to use ACPI
> S3.\n");
> -       acpi_suspend_lowlevel = NULL;
> +       acpi_set_suspend_lowlevel(NULL);
>  #endif
>
>         /*
> diff --git a/include/xen/acpi.h b/include/xen/acpi.h
> index b1e11863144d..81a1b6ee8fc2 100644
> --- a/include/xen/acpi.h
> +++ b/include/xen/acpi.h
> @@ -64,7 +64,7 @@ static inline void xen_acpi_sleep_register(void)
>                 acpi_os_set_prepare_extended_sleep(
>                         &xen_acpi_notify_hypervisor_extended_sleep);
>
> -               acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
> +               acpi_set_suspend_lowlevel(xen_acpi_suspend_lowlevel);
>         }
>  }
>  #else

Otherwise LGTM.
Huang, Kai Oct. 18, 2023, 10:51 a.m. UTC | #4
On Wed, 2023-10-18 at 12:15 +0200, Rafael J. Wysocki wrote:
> On Wed, Oct 18, 2023 at 5:22 AM Huang, Kai <kai.huang@intel.com> wrote:
> > 
> > Hi Rafael,
> > Thanks for feedback!
> > > 
> > 
> > 
> > > > @@ -1427,6 +1429,22 @@ static int __init tdx_init(void)
> > > >                 return -ENODEV;
> > > >         }
> > > > 
> > > > +#define HIBERNATION_MSG                \
> > > > +       "Disable TDX due to hibernation is available. Use 'nohibernate'
> > command line to disable hibernation."
> > > 
> > > I'm not sure if this new symbol is really necessary.
> > > 
> > > The message could be as simple as "Initialization failed: Hibernation
> > > support is enabled" (assuming a properly defined pr_fmt()), because
> > > that carries enough information about the reason for the failure IMO.
> > > 
> > > How to address it can be documented elsewhere.
> > 
> > 
> > The last patch of this series is the documentation patch to add TDX host.  We
> > can add a sentence to suggest the user to use 'nohibernate' kernel command line
> > when one sees TDX gets disabled because of hibernation being available.
> > 
> > But isn't better to just provide such information together in the dmesg so the
> > user can immediately know how to resolve this issue?
> > 
> > If user only sees "... failed: Hibernation support is enabled", then the user
> > will need additional knowledge to know where to look for the solution first, and
> > only after that, the user can know how to resolve this.
> 
> I would expect anyone interested in a given feature to get familiar
> with its documentation in the first place.  If they neglect to do that
> and then find this message, it is absolutely fair to expect them to go
> and look into the documentation after all.

OK.  I'll remove HIBERNATION_MSG and just print the message suggested by you.

And in the documentation patch, add one sentence to tell user when this happens,
add 'nohibernate' to resolve.


[...]

> > 
> > -/* Low-level suspend routine. */
> > -extern int (*acpi_suspend_lowlevel)(void);
> > +typedef int (*acpi_suspend_lowlevel_t)(void);
> > +
> > +/* Set up low-level suspend routine. */
> > +void acpi_set_suspend_lowlevel(acpi_suspend_lowlevel_t func);
> 
> I'm not sure about the typededf, but I have no strong opinion against it either.
> 
> > 
> >  /* Physical address to resume after wakeup */
> >  unsigned long acpi_get_wakeup_address(void);
> > diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> > index 2a0ea38955df..95be371305c6 100644
> > --- a/arch/x86/kernel/acpi/boot.c
> > +++ b/arch/x86/kernel/acpi/boot.c
> > @@ -779,11 +779,17 @@ int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
> >  void (*__acpi_unregister_gsi)(u32 gsi) = NULL;
> > 
> >  #ifdef CONFIG_ACPI_SLEEP
> > -int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
> > +static int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
> >  #else
> > -int (*acpi_suspend_lowlevel)(void);
> > +static int (*acpi_suspend_lowlevel)(void);
> 
> For the sake of consistency, either use the typedef here, or don't use
> it at all.

Ah right.

Since you don't prefer the typedef, I'll abandon it:

E.g,:

void acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void))
{
	acpi_suspend_lowlevel = suspend_lowlevel;
}

Let me know whether this looks good to you?

[...]

> 
> Otherwise LGTM.

Thanks. I'll split the helper patch out and include it to the next version of
this series.
Rafael J. Wysocki Oct. 18, 2023, 10:53 a.m. UTC | #5
On Wed, Oct 18, 2023 at 12:51 PM Huang, Kai <kai.huang@intel.com> wrote:
>
> On Wed, 2023-10-18 at 12:15 +0200, Rafael J. Wysocki wrote:
> > On Wed, Oct 18, 2023 at 5:22 AM Huang, Kai <kai.huang@intel.com> wrote:
> > >
> > > Hi Rafael,
> > > Thanks for feedback!
> > > >
> > >
> > >
> > > > > @@ -1427,6 +1429,22 @@ static int __init tdx_init(void)
> > > > >                 return -ENODEV;
> > > > >         }
> > > > >
> > > > > +#define HIBERNATION_MSG                \
> > > > > +       "Disable TDX due to hibernation is available. Use 'nohibernate'
> > > command line to disable hibernation."
> > > >
> > > > I'm not sure if this new symbol is really necessary.
> > > >
> > > > The message could be as simple as "Initialization failed: Hibernation
> > > > support is enabled" (assuming a properly defined pr_fmt()), because
> > > > that carries enough information about the reason for the failure IMO.
> > > >
> > > > How to address it can be documented elsewhere.
> > >
> > >
> > > The last patch of this series is the documentation patch to add TDX host.  We
> > > can add a sentence to suggest the user to use 'nohibernate' kernel command line
> > > when one sees TDX gets disabled because of hibernation being available.
> > >
> > > But isn't better to just provide such information together in the dmesg so the
> > > user can immediately know how to resolve this issue?
> > >
> > > If user only sees "... failed: Hibernation support is enabled", then the user
> > > will need additional knowledge to know where to look for the solution first, and
> > > only after that, the user can know how to resolve this.
> >
> > I would expect anyone interested in a given feature to get familiar
> > with its documentation in the first place.  If they neglect to do that
> > and then find this message, it is absolutely fair to expect them to go
> > and look into the documentation after all.
>
> OK.  I'll remove HIBERNATION_MSG and just print the message suggested by you.
>
> And in the documentation patch, add one sentence to tell user when this happens,
> add 'nohibernate' to resolve.
>
>
> [...]
>
> > >
> > > -/* Low-level suspend routine. */
> > > -extern int (*acpi_suspend_lowlevel)(void);
> > > +typedef int (*acpi_suspend_lowlevel_t)(void);
> > > +
> > > +/* Set up low-level suspend routine. */
> > > +void acpi_set_suspend_lowlevel(acpi_suspend_lowlevel_t func);
> >
> > I'm not sure about the typededf, but I have no strong opinion against it either.
> >
> > >
> > >  /* Physical address to resume after wakeup */
> > >  unsigned long acpi_get_wakeup_address(void);
> > > diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> > > index 2a0ea38955df..95be371305c6 100644
> > > --- a/arch/x86/kernel/acpi/boot.c
> > > +++ b/arch/x86/kernel/acpi/boot.c
> > > @@ -779,11 +779,17 @@ int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
> > >  void (*__acpi_unregister_gsi)(u32 gsi) = NULL;
> > >
> > >  #ifdef CONFIG_ACPI_SLEEP
> > > -int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
> > > +static int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
> > >  #else
> > > -int (*acpi_suspend_lowlevel)(void);
> > > +static int (*acpi_suspend_lowlevel)(void);
> >
> > For the sake of consistency, either use the typedef here, or don't use
> > it at all.
>
> Ah right.
>
> Since you don't prefer the typedef, I'll abandon it:
>
> E.g,:
>
> void acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void))
> {
>         acpi_suspend_lowlevel = suspend_lowlevel;
> }
>
> Let me know whether this looks good to you?

Yes, this is fine with me.

> [...]
>
> >
> > Otherwise LGTM.
>
> Thanks. I'll split the helper patch out and include it to the next version of
> this series.
>
Huang, Kai Oct. 19, 2023, 8:45 p.m. UTC | #6
> > 
> > > > 
> > > > -/* Low-level suspend routine. */
> > > > -extern int (*acpi_suspend_lowlevel)(void);
> > > > +typedef int (*acpi_suspend_lowlevel_t)(void);
> > > > +
> > > > +/* Set up low-level suspend routine. */
> > > > +void acpi_set_suspend_lowlevel(acpi_suspend_lowlevel_t func);
> > > 
> > > I'm not sure about the typededf, but I have no strong opinion against it either.
> > > 
> > > > 
> > > >  /* Physical address to resume after wakeup */
> > > >  unsigned long acpi_get_wakeup_address(void);
> > > > diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
> > > > index 2a0ea38955df..95be371305c6 100644
> > > > --- a/arch/x86/kernel/acpi/boot.c
> > > > +++ b/arch/x86/kernel/acpi/boot.c
> > > > @@ -779,11 +779,17 @@ int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
> > > >  void (*__acpi_unregister_gsi)(u32 gsi) = NULL;
> > > > 
> > > >  #ifdef CONFIG_ACPI_SLEEP
> > > > -int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
> > > > +static int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
> > > >  #else
> > > > -int (*acpi_suspend_lowlevel)(void);
> > > > +static int (*acpi_suspend_lowlevel)(void);
> > > 
> > > For the sake of consistency, either use the typedef here, or don't use
> > > it at all.
> > 
> > Ah right.
> > 
> > Since you don't prefer the typedef, I'll abandon it:
> > 
> > E.g,:
> > 
> > void acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void))
> > {
> >         acpi_suspend_lowlevel = suspend_lowlevel;
> > }
> > 
> > Let me know whether this looks good to you?
> 
> Yes, this is fine with me.
> 

Hi Rafael,

Sorry for the back and forth on this.

With the patch which provides the helper, LKP reported build error:

   drivers/acpi/sleep.c: In function 'acpi_suspend_enter':
>> drivers/acpi/sleep.c:600:22: error: 'acpi_suspend_lowlevel' undeclared (first
use in this function); did you mean 'acpi_set_suspend_lowlevel'?
     600 |                 if (!acpi_suspend_lowlevel)
         |                      ^~~~~~~~~~~~~~~~~~~~~
         |                      acpi_set_suspend_lowlevel

Turns out I disabled both suspend/hibernation in my own kernel build test, sigh.

The common ACPI sleep code requires the ARCH to declare 'acpi_suspend_lowlevel'
in <asm/acpi.h>, and define it somewhere in the ARCH code  too.

So sadly I cannot remove the acpi_suspend_lowlevel variable declaration in
<asm/acpi.h>.  And the ending patch would have both below in <asm/acpi.h>:

   /* Low-level suspend routine. */
   extern int (*acpi_suspend_lowlevel)(void);

  +/* To override @acpi_suspend_lowlevel at early boot */
  +void acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void));
  +

Thus I am not sure whether it still a good idea to have the helper?

Any comments?  Thanks!

Below is the full patch that I end up for your reference:

From 9372e0278e2a14419d08e8df36e474dc72d93784 Mon Sep 17 00:00:00 2001
From: Kai Huang <kai.huang@intel.com>
Date: Thu, 19 Oct 2023 14:37:19 +1300
Subject: [PATCH] x86/acpi: Add a helper to override ACPI lowlevel suspend
 function

ACPI S3 suspend code requires a valid 'acpi_suspend_lowlevel' function
pointer to work.  Each ARCH needs to set the acpi_suspend_lowlevel to
its own implementation to make ACPI S3 suspend work on that ARCH.  X86
implements a default function for that, and Xen PV dom0 overrides it
with its own version during early kernel boot.

Intel Trusted Domain Extensions (TDX) doesn't play nice with ACPI S3.
ACPI S3 suspend will gets disabled during kernel early boot if TDX is
enabled.

Add a helper function to override the acpi_suspend_lowlevel at kernel
early boot, so that the callers don't manipulate the function pointer
directly.  Change the Xen code to use the helper.  It will be used by
TDX code to disable ACPI S3 suspend too.

No functional change is intended.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/acpi.h | 3 +++
 arch/x86/kernel/acpi/boot.c | 9 +++++++--
 include/xen/acpi.h          | 2 +-
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index c8a7fc23f63c..6001df87526e 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -63,6 +63,9 @@ static inline void acpi_disable_pci(void)
 /* Low-level suspend routine. */
 extern int (*acpi_suspend_lowlevel)(void);

+/* To override @acpi_suspend_lowlevel at early boot */
+void acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void));
+
 /* Physical address to resume after wakeup */
 unsigned long acpi_get_wakeup_address(void);

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2a0ea38955df..e9143a8a0350 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -779,11 +779,16 @@ int (*__acpi_register_gsi)(struct device *dev, u32 gsi,
 void (*__acpi_unregister_gsi)(u32 gsi) = NULL;

 #ifdef CONFIG_ACPI_SLEEP
-int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
+static int (*acpi_suspend_lowlevel)(void) = x86_acpi_suspend_lowlevel;
 #else
-int (*acpi_suspend_lowlevel)(void);
+static int (*acpi_suspend_lowlevel)(void);
 #endif

+void __init acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void))
+{
+       acpi_suspend_lowlevel = suspend_lowlevel;
+}
+
 /*
  * success: return IRQ number (>=0)
  * failure: return < 0
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index b1e11863144d..81a1b6ee8fc2 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -64,7 +64,7 @@ static inline void xen_acpi_sleep_register(void)
                acpi_os_set_prepare_extended_sleep(
                        &xen_acpi_notify_hypervisor_extended_sleep);

-               acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
+               acpi_set_suspend_lowlevel(xen_acpi_suspend_lowlevel);
        }
 }
 #else
Huang, Kai Oct. 24, 2023, 10:46 a.m. UTC | #7
> 
> Hi Rafael,
> 
> Sorry for the back and forth on this.
> 
> With the patch which provides the helper, LKP reported build error:
> 
>    drivers/acpi/sleep.c: In function 'acpi_suspend_enter':
> > > drivers/acpi/sleep.c:600:22: error: 'acpi_suspend_lowlevel' undeclared (first
> use in this function); did you mean 'acpi_set_suspend_lowlevel'?
>      600 |                 if (!acpi_suspend_lowlevel)
>          |                      ^~~~~~~~~~~~~~~~~~~~~
>          |                      acpi_set_suspend_lowlevel
> 
> Turns out I disabled both suspend/hibernation in my own kernel build test, sigh.
> 
> The common ACPI sleep code requires the ARCH to declare 'acpi_suspend_lowlevel'
> in <asm/acpi.h>, and define it somewhere in the ARCH code  too.
> 
> So sadly I cannot remove the acpi_suspend_lowlevel variable declaration in
> <asm/acpi.h>.  And the ending patch would have both below in <asm/acpi.h>:
> 
>    /* Low-level suspend routine. */
>    extern int (*acpi_suspend_lowlevel)(void);
> 
>   +/* To override @acpi_suspend_lowlevel at early boot */
>   +void acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void));
>   +
> 
> Thus I am not sure whether it still a good idea to have the helper?
> 

Hi Rafael,

Could you help to take a look whether you still want the helper?

Ans sorry in previous reply I pasted an old/wrong patch, please ignore.  Below
is the patch that can build successfully. 


x86/acpi: Add a helper to override ACPI lowlevel suspend function

ACPI S3 suspend code requires a valid 'acpi_suspend_lowlevel' function
pointer to work.  Each ARCH needs to set the acpi_suspend_lowlevel to
its own implementation to make ACPI S3 suspend work on that ARCH.  X86
implements a default function for that, and Xen PV dom0 overrides it
with its own version during early kernel boot.

Intel Trusted Domain Extensions (TDX) doesn't play nice with ACPI S3.
ACPI S3 suspend will gets disabled during kernel early boot if TDX is
enabled.

Add a helper function to override the acpi_suspend_lowlevel at kernel
early boot, so that the callers don't manipulate the function pointer
directly.  Change the Xen code to use the helper.  It will be used by
TDX code to disable ACPI S3 suspend too.

No functional change is intended.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/acpi.h | 3 +++
 arch/x86/kernel/acpi/boot.c | 5 +++++
 include/xen/acpi.h          | 2 +-
 3 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/acpi.h b/arch/x86/include/asm/acpi.h
index c8a7fc23f63c..6001df87526e 100644
--- a/arch/x86/include/asm/acpi.h
+++ b/arch/x86/include/asm/acpi.h
@@ -63,6 +63,9 @@ static inline void acpi_disable_pci(void)
 /* Low-level suspend routine. */
 extern int (*acpi_suspend_lowlevel)(void);

+/* To override @acpi_suspend_lowlevel at early boot */
+void acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void));
+
 /* Physical address to resume after wakeup */
 unsigned long acpi_get_wakeup_address(void);

diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2a0ea38955df..f9858b108387 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -784,6 +784,11 @@ int (*acpi_suspend_lowlevel)(void) =
x86_acpi_suspend_lowlevel;
 int (*acpi_suspend_lowlevel)(void);
 #endif

+void __init acpi_set_suspend_lowlevel(int (*suspend_lowlevel)(void))
+{
+       acpi_suspend_lowlevel = suspend_lowlevel;
+}
+
 /*
  * success: return IRQ number (>=0)
  * failure: return < 0
diff --git a/include/xen/acpi.h b/include/xen/acpi.h
index b1e11863144d..81a1b6ee8fc2 100644
--- a/include/xen/acpi.h
+++ b/include/xen/acpi.h
@@ -64,7 +64,7 @@ static inline void xen_acpi_sleep_register(void)
                acpi_os_set_prepare_extended_sleep(
                        &xen_acpi_notify_hypervisor_extended_sleep);

-               acpi_suspend_lowlevel = xen_acpi_suspend_lowlevel;
+               acpi_set_suspend_lowlevel(xen_acpi_suspend_lowlevel);
        }
 }
 #else
diff mbox series

Patch

diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index e82f0adeea4d..1d0f1045dd33 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -28,10 +28,12 @@ 
 #include <linux/sort.h>
 #include <linux/log2.h>
 #include <linux/reboot.h>
+#include <linux/suspend.h>
 #include <asm/msr-index.h>
 #include <asm/msr.h>
 #include <asm/page.h>
 #include <asm/special_insns.h>
+#include <asm/acpi.h>
 #include <asm/tdx.h>
 #include "tdx.h"
 
@@ -1427,6 +1429,22 @@  static int __init tdx_init(void)
 		return -ENODEV;
 	}
 
+#define HIBERNATION_MSG		\
+	"Disable TDX due to hibernation is available. Use 'nohibernate' command line to disable hibernation."
+	/*
+	 * Note hibernation_available() can vary when it is called at
+	 * runtime as it checks secretmem_active() and cxl_mem_active()
+	 * which can both vary at runtime.  But here at early_init() they
+	 * both cannot return true, thus when hibernation_available()
+	 * returns false here, hibernation is disabled by either
+	 * 'nohibernate' or LOCKDOWN_HIBERNATION security lockdown,
+	 * which are both permanent.
+	 */
+	if (hibernation_available()) {
+		pr_err("initialization failed: %s\n", HIBERNATION_MSG);
+		return -ENODEV;
+	}
+
 	err = register_memory_notifier(&tdx_memory_nb);
 	if (err) {
 		pr_err("initialization failed: register_memory_notifier() failed (%d)\n",
@@ -1442,6 +1460,11 @@  static int __init tdx_init(void)
 		return -ENODEV;
 	}
 
+#ifdef CONFIG_ACPI
+	pr_info("Disable ACPI S3 suspend. Turn off TDX in the BIOS to use ACPI S3.\n");
+	acpi_suspend_lowlevel = NULL;
+#endif
+
 	/*
 	 * Just use the first TDX KeyID as the 'global KeyID' and
 	 * leave the rest for TDX guests.