Message ID | c9365d42-c15d-4d93-acd8-106ca46cb7f3@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/PVH: expose OEMx ACPI tables to Dom0 | expand |
On Wed, Mar 26, 2025 at 09:45:09AM +0100, Jan Beulich wrote: > What they contain we don't know, but we can't sensibly hide them. On my > Skylake system OEM1 (with a description of "INTEL CPU EIST") is what > contains all the _PCT, _PPC, and _PSS methods, i.e. about everything > needed for cpufreq. (_PSD interestingly are in an SSDT there.) > > Further OEM2 there has a description of "INTEL CPU HWP", while OEM4 > has "INTEL CPU CST". Pretty clearly all three need exposing for > cpufreq and cpuidle to work. > > Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables") > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > --- > Sadly the commit above says nothing at all about the criteria used by > which tables would want to be whitelisted. I think at that point it was mostly an allow list that enabled me to boot PVH on the systems I was testing. I don't think it was intended to be complete, but rather something that we would expand as needed. There where many and bigger missing pieces of PVH dom0 when that was committed. > Further tables on said system > which weren't blacklisted prior to that commit, yet which also aren't > whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF, > SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any > of them it seems pretty clear to me that at least some would want > whitelisting, too. I cannot find any reference about: FIDT, MIGT, NITR, SVOS and WDDT in the ACPI spec. The MSCT I think we don't want to expose, as it's related to topology data. Regarding RASF I would be slightly worried about the patrol scrub feature. The memory map exposed to dom0 will be different from the native one, and there's also the interposed p2m. The rest are likely fine to expose. > > --- a/xen/arch/x86/hvm/dom0_build.c > +++ b/xen/arch/x86/hvm/dom0_build.c > @@ -1010,12 +1010,20 @@ static bool __init pvh_acpi_table_allowe > return true; > else > { > + skip: > printk("Skipping table %.4s in non-ACPI non-reserved region\n", > sig); > return false; > } > } > > + if ( !strncmp(sig, "OEM", 3) ) > + { > + if ( acpi_memory_banned(address, size) ) > + goto skip; > + return true; > + } I may have put this ahead of the loop, so that the goto label doesn't go backwards (which always feels weird to me). > + I wonder if additionally we should print tables filtered to dom0 here: if ( opt_dom0_verbose ) printk("Hidden ACPI Table %.4s\n", sig); So that it's more obvious which tables are not exposed. Thanks, Roger.
On 26.03.2025 10:21, Roger Pau Monné wrote: > On Wed, Mar 26, 2025 at 09:45:09AM +0100, Jan Beulich wrote: >> What they contain we don't know, but we can't sensibly hide them. On my >> Skylake system OEM1 (with a description of "INTEL CPU EIST") is what >> contains all the _PCT, _PPC, and _PSS methods, i.e. about everything >> needed for cpufreq. (_PSD interestingly are in an SSDT there.) >> >> Further OEM2 there has a description of "INTEL CPU HWP", while OEM4 >> has "INTEL CPU CST". Pretty clearly all three need exposing for >> cpufreq and cpuidle to work. >> >> Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables") >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. >> --- >> Sadly the commit above says nothing at all about the criteria used by >> which tables would want to be whitelisted. > > I think at that point it was mostly an allow list that enabled me to > boot PVH on the systems I was testing. I don't think it was intended > to be complete, but rather something that we would expand as needed. > There where many and bigger missing pieces of PVH dom0 when that was > committed. To be frank, with that it was premature to declare PVH Dom0 fully supported. This aspect also isn't mentioned in the caveats in SUPPORT.md. >> Further tables on said system >> which weren't blacklisted prior to that commit, yet which also aren't >> whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF, >> SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any >> of them it seems pretty clear to me that at least some would want >> whitelisting, too. > > I cannot find any reference about: FIDT, MIGT, NITR, SVOS and WDDT in > the ACPI spec. WDDT - Watchdog Descriptor Table (Table 5.6 in spec version 6.5) > The MSCT I think we don't want to expose, as it's related to topology > data. > > Regarding RASF I would be slightly worried about the patrol scrub > feature. The memory map exposed to dom0 will be different from the > native one, and there's also the interposed p2m. Thing is - either kind of Dom0 needs to have a sufficient level of insight into the host memory map to support memory-related RAS features. Which may mean that RASF may only be exposed if the Dom0 kernel declares itself as aware of the need to consider data there to refer to a separate address space. >> --- a/xen/arch/x86/hvm/dom0_build.c >> +++ b/xen/arch/x86/hvm/dom0_build.c >> @@ -1010,12 +1010,20 @@ static bool __init pvh_acpi_table_allowe >> return true; >> else >> { >> + skip: >> printk("Skipping table %.4s in non-ACPI non-reserved region\n", >> sig); >> return false; >> } >> } >> >> + if ( !strncmp(sig, "OEM", 3) ) >> + { >> + if ( acpi_memory_banned(address, size) ) >> + goto skip; >> + return true; >> + } > > I may have put this ahead of the loop, so that the goto label doesn't > go backwards (which always feels weird to me). It felt odd to me to put it first; I'm almost always hesitant to add stuff to the front of something that's already there, due to the possible implication of "what I add is more important than what was there before". As to label vs goto placement: It's the other way around for me. C wants everything else declared before use. Hence I prefer to have labels appear before their use. I'm actually puzzled by Misra not sharing that view, and instead having Rule 15.2 (which we haven't adopted yet afaics) to demand the opposite (and assuming Rule 15.1 is being violated in the first place). >> + > > I wonder if additionally we should print tables filtered to dom0 here: > > if ( opt_dom0_verbose ) > printk("Hidden ACPI Table %.4s\n", sig); > > So that it's more obvious which tables are not exposed. I, too, thought about that, but assumed it may not have been done before for a reason. Plus it would want to be a separate change anyway, imo. Jan
On Wed, Mar 26, 2025 at 10:45:18AM +0100, Jan Beulich wrote: > On 26.03.2025 10:21, Roger Pau Monné wrote: > > On Wed, Mar 26, 2025 at 09:45:09AM +0100, Jan Beulich wrote: > >> What they contain we don't know, but we can't sensibly hide them. On my > >> Skylake system OEM1 (with a description of "INTEL CPU EIST") is what > >> contains all the _PCT, _PPC, and _PSS methods, i.e. about everything > >> needed for cpufreq. (_PSD interestingly are in an SSDT there.) > >> > >> Further OEM2 there has a description of "INTEL CPU HWP", while OEM4 > >> has "INTEL CPU CST". Pretty clearly all three need exposing for > >> cpufreq and cpuidle to work. > >> > >> Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables") > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > > Thanks. > > >> --- > >> Sadly the commit above says nothing at all about the criteria used by > >> which tables would want to be whitelisted. > > > > I think at that point it was mostly an allow list that enabled me to > > boot PVH on the systems I was testing. I don't think it was intended > > to be complete, but rather something that we would expand as needed. > > There where many and bigger missing pieces of PVH dom0 when that was > > committed. > > To be frank, with that it was premature to declare PVH Dom0 fully supported. > This aspect also isn't mentioned in the caveats in SUPPORT.md. It's supported with caveats, so I wouldn't call it fully supported. See for example the recent addition of the pf-fixup option. Note the wording in SUPPORT.md: "PVH dom0 hasn't received the same test coverage as PV dom0, so it can exhibit unexpected behavior or issues on some hardware. At least the following features are missing on a PVH dom0:" I think it's quite clear from the usage of "at least" that the list of caveats might not be complete. The missing ACPI tables would just be one extra caveat, which sadly we didn't list. > >> Further tables on said system > >> which weren't blacklisted prior to that commit, yet which also aren't > >> whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF, > >> SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any > >> of them it seems pretty clear to me that at least some would want > >> whitelisting, too. > > > > I cannot find any reference about: FIDT, MIGT, NITR, SVOS and WDDT in > > the ACPI spec. > > WDDT - Watchdog Descriptor Table (Table 5.6 in spec version 6.5) > > > The MSCT I think we don't want to expose, as it's related to topology > > data. > > > > Regarding RASF I would be slightly worried about the patrol scrub > > feature. The memory map exposed to dom0 will be different from the > > native one, and there's also the interposed p2m. > > Thing is - either kind of Dom0 needs to have a sufficient level of insight > into the host memory map to support memory-related RAS features. Which may > mean that RASF may only be exposed if the Dom0 kernel declares itself as > aware of the need to consider data there to refer to a separate address > space. Yes, but then, how is a PVH dom0 going to be aware of such addresses? Given the automatic translation of gfn -> mfn that's completely hidden from dom0. I'm not saying it can't be done, but I think RASF shouldn't be exposed to a PVH dom0 until we understand how such feature is supposed to work with the interposed p2m and the fabricated memory map available to a PVH dom0. Note a PVH dom0 can still get the host memory map from the XENMEM_machine_memory_map hypercall. > >> --- a/xen/arch/x86/hvm/dom0_build.c > >> +++ b/xen/arch/x86/hvm/dom0_build.c > >> @@ -1010,12 +1010,20 @@ static bool __init pvh_acpi_table_allowe > >> return true; > >> else > >> { > >> + skip: > >> printk("Skipping table %.4s in non-ACPI non-reserved region\n", > >> sig); > >> return false; > >> } > >> } > >> > >> + if ( !strncmp(sig, "OEM", 3) ) > >> + { > >> + if ( acpi_memory_banned(address, size) ) > >> + goto skip; > >> + return true; > >> + } > > > > I may have put this ahead of the loop, so that the goto label doesn't > > go backwards (which always feels weird to me). > > It felt odd to me to put it first; I'm almost always hesitant to add stuff > to the front of something that's already there, due to the possible > implication of "what I add is more important than what was there before". > > As to label vs goto placement: It's the other way around for me. C wants > everything else declared before use. Hence I prefer to have labels appear > before their use. I'm actually puzzled by Misra not sharing that view, > and instead having Rule 15.2 (which we haven't adopted yet afaics) to > demand the opposite (and assuming Rule 15.1 is being violated in the > first place). Oh, I see. No strong opinion really. > >> + > > > > I wonder if additionally we should print tables filtered to dom0 here: > > > > if ( opt_dom0_verbose ) > > printk("Hidden ACPI Table %.4s\n", sig); > > > > So that it's more obvious which tables are not exposed. > > I, too, thought about that, but assumed it may not have been done before > for a reason. Plus it would want to be a separate change anyway, imo. Indeed, it should be a separate change, sorry if my comment made it look I would rather merge with the current commit. Thanks, Roger.
On 26.03.2025 11:18, Roger Pau Monné wrote: > On Wed, Mar 26, 2025 at 10:45:18AM +0100, Jan Beulich wrote: >> On 26.03.2025 10:21, Roger Pau Monné wrote: >>> On Wed, Mar 26, 2025 at 09:45:09AM +0100, Jan Beulich wrote: >>>> Further tables on said system >>>> which weren't blacklisted prior to that commit, yet which also aren't >>>> whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF, >>>> SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any >>>> of them it seems pretty clear to me that at least some would want >>>> whitelisting, too. >>> >>> I cannot find any reference about: FIDT, MIGT, NITR, SVOS and WDDT in >>> the ACPI spec. >> >> WDDT - Watchdog Descriptor Table (Table 5.6 in spec version 6.5) >> >>> The MSCT I think we don't want to expose, as it's related to topology >>> data. >>> >>> Regarding RASF I would be slightly worried about the patrol scrub >>> feature. The memory map exposed to dom0 will be different from the >>> native one, and there's also the interposed p2m. >> >> Thing is - either kind of Dom0 needs to have a sufficient level of insight >> into the host memory map to support memory-related RAS features. Which may >> mean that RASF may only be exposed if the Dom0 kernel declares itself as >> aware of the need to consider data there to refer to a separate address >> space. > > Yes, but then, how is a PVH dom0 going to be aware of such addresses? > Given the automatic translation of gfn -> mfn that's completely hidden > from dom0. Right, but the knowledge is going to be necessary there, I expect. > I'm not saying it can't be done, but I think RASF shouldn't be exposed > to a PVH dom0 until we understand how such feature is supposed to work > with the interposed p2m and the fabricated memory map available to a > PVH dom0. Note a PVH dom0 can still get the host memory map from the > XENMEM_machine_memory_map hypercall. Oh, of course I agree we can't expose such without first having dealt with the implications. Jan
--- a/xen/arch/x86/hvm/dom0_build.c +++ b/xen/arch/x86/hvm/dom0_build.c @@ -1010,12 +1010,20 @@ static bool __init pvh_acpi_table_allowe return true; else { + skip: printk("Skipping table %.4s in non-ACPI non-reserved region\n", sig); return false; } } + if ( !strncmp(sig, "OEM", 3) ) + { + if ( acpi_memory_banned(address, size) ) + goto skip; + return true; + } + return false; }
What they contain we don't know, but we can't sensibly hide them. On my Skylake system OEM1 (with a description of "INTEL CPU EIST") is what contains all the _PCT, _PPC, and _PSS methods, i.e. about everything needed for cpufreq. (_PSD interestingly are in an SSDT there.) Further OEM2 there has a description of "INTEL CPU HWP", while OEM4 has "INTEL CPU CST". Pretty clearly all three need exposing for cpufreq and cpuidle to work. Fixes: 8b1a5268daf0 ("pvh/dom0: whitelist PVH Dom0 ACPI tables") Signed-off-by: Jan Beulich <jbeulich@suse.com> --- Sadly the commit above says nothing at all about the criteria used by which tables would want to be whitelisted. Further tables on said system which weren't blacklisted prior to that commit, yet which also aren't whitelisted are DBGP, DBG2, FIDT, LPIT, MIGT, MSCT, NITR, PCCT, RASF, SVOS, UEFI, WDDT, and WSMT. Even without diving into the details of any of them it seems pretty clear to me that at least some would want whitelisting, too.