diff mbox series

x86/PVH: expose OEMx ACPI tables to Dom0

Message ID c9365d42-c15d-4d93-acd8-106ca46cb7f3@suse.com (mailing list archive)
State New
Headers show
Series x86/PVH: expose OEMx ACPI tables to Dom0 | expand

Commit Message

Jan Beulich March 26, 2025, 8:45 a.m. UTC
What they contain we don't know, but we can't sensibly hide them. On my
Skylake system OEM1 (with a description of "INTEL  CPU EIST") is what
contains all the _PCT, _PPC, and _PSS methods, i.e. about everything
needed for cpufreq. (_PSD interestingly are in an SSDT there.)

Further OEM2 there has a description of "INTEL  CPU  HWP", while OEM4
has "INTEL  CPU  CST". Pretty clearly all three need exposing for
cpufreq and cpuidle to work.

Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables")
Signed-off-by: Jan Beulich <jbeulich@suse.com>
---
Sadly the commit above says nothing at all about the criteria used by
which tables would want to be whitelisted. Further tables on said system
which weren't blacklisted prior to that commit, yet which also aren't
whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF,
SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any
of them it seems pretty clear to me that at least some would want
whitelisting, too.

Comments

Roger Pau Monné March 26, 2025, 9:21 a.m. UTC | #1
On Wed, Mar 26, 2025 at 09:45:09AM +0100, Jan Beulich wrote:
> What they contain we don't know, but we can't sensibly hide them. On my
> Skylake system OEM1 (with a description of "INTEL  CPU EIST") is what
> contains all the _PCT, _PPC, and _PSS methods, i.e. about everything
> needed for cpufreq. (_PSD interestingly are in an SSDT there.)
> 
> Further OEM2 there has a description of "INTEL  CPU  HWP", while OEM4
> has "INTEL  CPU  CST". Pretty clearly all three need exposing for
> cpufreq and cpuidle to work.
> 
> Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables")
> Signed-off-by: Jan Beulich <jbeulich@suse.com>

Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

> ---
> Sadly the commit above says nothing at all about the criteria used by
> which tables would want to be whitelisted.

I think at that point it was mostly an allow list that enabled me to
boot PVH on the systems I was testing.  I don't think it was intended
to be complete, but rather something that we would expand as needed.
There where many and bigger missing pieces of PVH dom0 when that was
committed.

> Further tables on said system
> which weren't blacklisted prior to that commit, yet which also aren't
> whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF,
> SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any
> of them it seems pretty clear to me that at least some would want
> whitelisting, too.

I cannot find any reference about: FIDT, MIGT, NITR, SVOS and WDDT in
the ACPI spec.

The MSCT I think we don't want to expose, as it's related to topology
data.

Regarding RASF I would be slightly worried about the patrol scrub
feature.  The memory map exposed to dom0 will be different from the
native one, and there's also the interposed p2m.

The rest are likely fine to expose.

> 
> --- a/xen/arch/x86/hvm/dom0_build.c
> +++ b/xen/arch/x86/hvm/dom0_build.c
> @@ -1010,12 +1010,20 @@ static bool __init pvh_acpi_table_allowe
>              return true;
>          else
>          {
> +    skip:
>              printk("Skipping table %.4s in non-ACPI non-reserved region\n",
>                     sig);
>              return false;
>          }
>      }
>  
> +    if ( !strncmp(sig, "OEM", 3) )
> +    {
> +        if ( acpi_memory_banned(address, size) )
> +            goto skip;
> +        return true;
> +    }

I may have put this ahead of the loop, so that the goto label doesn't
go backwards (which always feels weird to me).

> +

I wonder if additionally we should print tables filtered to dom0 here:

if ( opt_dom0_verbose )
    printk("Hidden ACPI Table %.4s\n", sig);

So that it's more obvious which tables are not exposed.

Thanks, Roger.
Jan Beulich March 26, 2025, 9:45 a.m. UTC | #2
On 26.03.2025 10:21, Roger Pau Monné wrote:
> On Wed, Mar 26, 2025 at 09:45:09AM +0100, Jan Beulich wrote:
>> What they contain we don't know, but we can't sensibly hide them. On my
>> Skylake system OEM1 (with a description of "INTEL  CPU EIST") is what
>> contains all the _PCT, _PPC, and _PSS methods, i.e. about everything
>> needed for cpufreq. (_PSD interestingly are in an SSDT there.)
>>
>> Further OEM2 there has a description of "INTEL  CPU  HWP", while OEM4
>> has "INTEL  CPU  CST". Pretty clearly all three need exposing for
>> cpufreq and cpuidle to work.
>>
>> Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables")
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> 
> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>

Thanks.

>> ---
>> Sadly the commit above says nothing at all about the criteria used by
>> which tables would want to be whitelisted.
> 
> I think at that point it was mostly an allow list that enabled me to
> boot PVH on the systems I was testing.  I don't think it was intended
> to be complete, but rather something that we would expand as needed.
> There where many and bigger missing pieces of PVH dom0 when that was
> committed.

To be frank, with that it was premature to declare PVH Dom0 fully supported.
This aspect also isn't mentioned in the caveats in SUPPORT.md.

>> Further tables on said system
>> which weren't blacklisted prior to that commit, yet which also aren't
>> whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF,
>> SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any
>> of them it seems pretty clear to me that at least some would want
>> whitelisting, too.
> 
> I cannot find any reference about: FIDT, MIGT, NITR, SVOS and WDDT in
> the ACPI spec.

WDDT - Watchdog Descriptor Table (Table 5.6 in spec version 6.5)

> The MSCT I think we don't want to expose, as it's related to topology
> data.
> 
> Regarding RASF I would be slightly worried about the patrol scrub
> feature.  The memory map exposed to dom0 will be different from the
> native one, and there's also the interposed p2m.

Thing is - either kind of Dom0 needs to have a sufficient level of insight
into the host memory map to support memory-related RAS features. Which may
mean that RASF may only be exposed if the Dom0 kernel declares itself as
aware of the need to consider data there to refer to a separate address
space.

>> --- a/xen/arch/x86/hvm/dom0_build.c
>> +++ b/xen/arch/x86/hvm/dom0_build.c
>> @@ -1010,12 +1010,20 @@ static bool __init pvh_acpi_table_allowe
>>              return true;
>>          else
>>          {
>> +    skip:
>>              printk("Skipping table %.4s in non-ACPI non-reserved region\n",
>>                     sig);
>>              return false;
>>          }
>>      }
>>  
>> +    if ( !strncmp(sig, "OEM", 3) )
>> +    {
>> +        if ( acpi_memory_banned(address, size) )
>> +            goto skip;
>> +        return true;
>> +    }
> 
> I may have put this ahead of the loop, so that the goto label doesn't
> go backwards (which always feels weird to me).

It felt odd to me to put it first; I'm almost always hesitant to add stuff
to the front of something that's already there, due to the possible
implication of "what I add is more important than what was there before".

As to label vs goto placement: It's the other way around for me. C wants
everything else declared before use. Hence I prefer to have labels appear
before their use. I'm actually puzzled by Misra not sharing that view,
and instead having Rule 15.2 (which we haven't adopted yet afaics) to
demand the opposite (and assuming Rule 15.1 is being violated in the
first place).

>> +
> 
> I wonder if additionally we should print tables filtered to dom0 here:
> 
> if ( opt_dom0_verbose )
>     printk("Hidden ACPI Table %.4s\n", sig);
> 
> So that it's more obvious which tables are not exposed.

I, too, thought about that, but assumed it may not have been done before
for a reason. Plus it would want to be a separate change anyway, imo.

Jan
Roger Pau Monné March 26, 2025, 10:18 a.m. UTC | #3
On Wed, Mar 26, 2025 at 10:45:18AM +0100, Jan Beulich wrote:
> On 26.03.2025 10:21, Roger Pau Monné wrote:
> > On Wed, Mar 26, 2025 at 09:45:09AM +0100, Jan Beulich wrote:
> >> What they contain we don't know, but we can't sensibly hide them. On my
> >> Skylake system OEM1 (with a description of "INTEL  CPU EIST") is what
> >> contains all the _PCT, _PPC, and _PSS methods, i.e. about everything
> >> needed for cpufreq. (_PSD interestingly are in an SSDT there.)
> >>
> >> Further OEM2 there has a description of "INTEL  CPU  HWP", while OEM4
> >> has "INTEL  CPU  CST". Pretty clearly all three need exposing for
> >> cpufreq and cpuidle to work.
> >>
> >> Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables")
> >> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> > 
> > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Thanks.
> 
> >> ---
> >> Sadly the commit above says nothing at all about the criteria used by
> >> which tables would want to be whitelisted.
> > 
> > I think at that point it was mostly an allow list that enabled me to
> > boot PVH on the systems I was testing.  I don't think it was intended
> > to be complete, but rather something that we would expand as needed.
> > There where many and bigger missing pieces of PVH dom0 when that was
> > committed.
> 
> To be frank, with that it was premature to declare PVH Dom0 fully supported.
> This aspect also isn't mentioned in the caveats in SUPPORT.md.

It's supported with caveats, so I wouldn't call it fully supported.
See for example the recent addition of the pf-fixup option.

Note the wording in SUPPORT.md:

"PVH dom0 hasn't received the same test coverage as PV dom0, so it can exhibit
unexpected behavior or issues on some hardware.

At least the following features are missing on a PVH dom0:"

I think it's quite clear from the usage of "at least" that the list of
caveats might not be complete.  The missing ACPI tables would just be
one extra caveat, which sadly we didn't list.

> >> Further tables on said system
> >> which weren't blacklisted prior to that commit, yet which also aren't
> >> whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF,
> >> SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any
> >> of them it seems pretty clear to me that at least some would want
> >> whitelisting, too.
> > 
> > I cannot find any reference about: FIDT, MIGT, NITR, SVOS and WDDT in
> > the ACPI spec.
> 
> WDDT - Watchdog Descriptor Table (Table 5.6 in spec version 6.5)
> 
> > The MSCT I think we don't want to expose, as it's related to topology
> > data.
> > 
> > Regarding RASF I would be slightly worried about the patrol scrub
> > feature.  The memory map exposed to dom0 will be different from the
> > native one, and there's also the interposed p2m.
> 
> Thing is - either kind of Dom0 needs to have a sufficient level of insight
> into the host memory map to support memory-related RAS features. Which may
> mean that RASF may only be exposed if the Dom0 kernel declares itself as
> aware of the need to consider data there to refer to a separate address
> space.

Yes, but then, how is a PVH dom0 going to be aware of such addresses?
Given the automatic translation of gfn -> mfn that's completely hidden
from dom0.

I'm not saying it can't be done, but I think RASF shouldn't be exposed
to a PVH dom0 until we understand how such feature is supposed to work
with the interposed p2m and the fabricated memory map available to a
PVH dom0.  Note a PVH dom0 can still get the host memory map from the
XENMEM_machine_memory_map hypercall.

> >> --- a/xen/arch/x86/hvm/dom0_build.c
> >> +++ b/xen/arch/x86/hvm/dom0_build.c
> >> @@ -1010,12 +1010,20 @@ static bool __init pvh_acpi_table_allowe
> >>              return true;
> >>          else
> >>          {
> >> +    skip:
> >>              printk("Skipping table %.4s in non-ACPI non-reserved region\n",
> >>                     sig);
> >>              return false;
> >>          }
> >>      }
> >>  
> >> +    if ( !strncmp(sig, "OEM", 3) )
> >> +    {
> >> +        if ( acpi_memory_banned(address, size) )
> >> +            goto skip;
> >> +        return true;
> >> +    }
> > 
> > I may have put this ahead of the loop, so that the goto label doesn't
> > go backwards (which always feels weird to me).
> 
> It felt odd to me to put it first; I'm almost always hesitant to add stuff
> to the front of something that's already there, due to the possible
> implication of "what I add is more important than what was there before".
> 
> As to label vs goto placement: It's the other way around for me. C wants
> everything else declared before use. Hence I prefer to have labels appear
> before their use. I'm actually puzzled by Misra not sharing that view,
> and instead having Rule 15.2 (which we haven't adopted yet afaics) to
> demand the opposite (and assuming Rule 15.1 is being violated in the
> first place).

Oh, I see.  No strong opinion really.

> >> +
> > 
> > I wonder if additionally we should print tables filtered to dom0 here:
> > 
> > if ( opt_dom0_verbose )
> >     printk("Hidden ACPI Table %.4s\n", sig);
> > 
> > So that it's more obvious which tables are not exposed.
> 
> I, too, thought about that, but assumed it may not have been done before
> for a reason. Plus it would want to be a separate change anyway, imo.

Indeed, it should be a separate change, sorry if my comment made it
look I would rather merge with the current commit.

Thanks, Roger.
Jan Beulich March 26, 2025, 10:21 a.m. UTC | #4
On 26.03.2025 11:18, Roger Pau Monné wrote:
> On Wed, Mar 26, 2025 at 10:45:18AM +0100, Jan Beulich wrote:
>> On 26.03.2025 10:21, Roger Pau Monné wrote:
>>> On Wed, Mar 26, 2025 at 09:45:09AM +0100, Jan Beulich wrote:
>>>> Further tables on said system
>>>> which weren't blacklisted prior to that commit, yet which also aren't
>>>> whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF,
>>>> SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any
>>>> of them it seems pretty clear to me that at least some would want
>>>> whitelisting, too.
>>>
>>> I cannot find any reference about: FIDT, MIGT, NITR, SVOS and WDDT in
>>> the ACPI spec.
>>
>> WDDT - Watchdog Descriptor Table (Table 5.6 in spec version 6.5)
>>
>>> The MSCT I think we don't want to expose, as it's related to topology
>>> data.
>>>
>>> Regarding RASF I would be slightly worried about the patrol scrub
>>> feature.  The memory map exposed to dom0 will be different from the
>>> native one, and there's also the interposed p2m.
>>
>> Thing is - either kind of Dom0 needs to have a sufficient level of insight
>> into the host memory map to support memory-related RAS features. Which may
>> mean that RASF may only be exposed if the Dom0 kernel declares itself as
>> aware of the need to consider data there to refer to a separate address
>> space.
> 
> Yes, but then, how is a PVH dom0 going to be aware of such addresses?
> Given the automatic translation of gfn -> mfn that's completely hidden
> from dom0.

Right, but the knowledge is going to be necessary there, I expect.

> I'm not saying it can't be done, but I think RASF shouldn't be exposed
> to a PVH dom0 until we understand how such feature is supposed to work
> with the interposed p2m and the fabricated memory map available to a
> PVH dom0.  Note a PVH dom0 can still get the host memory map from the
> XENMEM_machine_memory_map hypercall.

Oh, of course I agree we can't expose such without first having dealt with
the implications.

Jan
diff mbox series

Patch

--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1010,12 +1010,20 @@  static bool __init pvh_acpi_table_allowe
             return true;
         else
         {
+    skip:
             printk("Skipping table %.4s in non-ACPI non-reserved region\n",
                    sig);
             return false;
         }
     }
 
+    if ( !strncmp(sig, "OEM", 3) )
+    {
+        if ( acpi_memory_banned(address, size) )
+            goto skip;
+        return true;
+    }
+
     return false;
 }