Message ID | 20230912162305.34339-1-roger.pau@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/amd: do not expose HWCR.TscFreqSel to guests | expand |
On 12/09/2023 5:23 pm, Roger Pau Monne wrote: > OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is > set, and will also attempt to unconditionally access HWCR if the TSC is > reported as Invariant. > > The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a > (bogus) warning message, but doing so at the cost of OpenBSD not booting is not > a suitable solution. > > In order to fix expose an empty HWCR. At first I was thinking a straight up revert, but AMD's CPUID Faulting is an architectural bit in here so it's worth keeping the register around. > > Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > Not sure whether we want to expose something when is_cpufreq_controller() is > true, seeing as there's a special wrmsr handler for the same MSR in that case. > Likely should be done for PV only, but also likely quite bogus. > > Missing reported by as the issue came from the QubesOS tracker. Well - we can at least have a: Link: https://github.com/QubesOS/qubes-issues/issues/8502 in the commit message, and it's probably worth asking Solène / Marek (both CC'd) if they want a Reported-by tag. > --- > xen/arch/x86/msr.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c > index 3f0450259cdf..964d500ff8a1 100644 > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) > case MSR_K8_HWCR: > if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > goto gp_fault; > - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 > - ? K8_HWCR_TSC_FREQ_SEL : 0; > + /* > + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as > + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to > + * also poke at PSTATE0. > + */ While this is true, the justification for removing this is because TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. Also because it's addition without writing into the migration stream was bogus irrespective of the specifics of the bit. I'm still of the opinion that it's buggy for OpenBSD to be looking at model specific bits when virtualised, but given my latest reading of the AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it can see TSC_FREQ_SEL. In some theoretical future where the toolstack better understands MSRs and (non)migratable VMs (which is the QubesOS usecase), then it would in principle be fine to construct a VM which can see the host TSC_FREQ_SEL and PSTATE* values. Preferably with an adjusted comment, Reviewed-by: Andrew Cooper <andrew.cooper3@citrix.com> ~Andrew
On 12/09/2023 5:35 pm, Andrew Cooper wrote: > On 12/09/2023 5:23 pm, Roger Pau Monne wrote: >> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is >> set, and will also attempt to unconditionally access HWCR if the TSC is >> reported as Invariant. >> >> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a >> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not >> a suitable solution. >> >> In order to fix expose an empty HWCR. > At first I was thinking a straight up revert, but AMD's CPUID Faulting > is an architectural bit in here so it's worth keeping the register around. > >> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >> --- >> Not sure whether we want to expose something when is_cpufreq_controller() is >> true, seeing as there's a special wrmsr handler for the same MSR in that case. >> Likely should be done for PV only, but also likely quite bogus. >> >> Missing reported by as the issue came from the QubesOS tracker. > Well - we can at least have a: > > Link: https://github.com/QubesOS/qubes-issues/issues/8502 > > in the commit message, and it's probably worth asking Solène / Marek > (both CC'd) if they want a Reported-by tag. > >> --- >> xen/arch/x86/msr.c | 8 ++++++-- >> 1 file changed, 6 insertions(+), 2 deletions(-) >> >> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c >> index 3f0450259cdf..964d500ff8a1 100644 >> --- a/xen/arch/x86/msr.c >> +++ b/xen/arch/x86/msr.c >> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) >> case MSR_K8_HWCR: >> if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) >> goto gp_fault; >> - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 >> - ? K8_HWCR_TSC_FREQ_SEL : 0; >> + /* >> + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as >> + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to >> + * also poke at PSTATE0. >> + */ > While this is true, the justification for removing this is because > TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. > > Also because it's addition without writing into the migration stream was > bogus irrespective of the specifics of the bit. > > I'm still of the opinion that it's buggy for OpenBSD to be looking at > model specific bits when virtualised, but given my latest reading of the > AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it > can see TSC_FREQ_SEL. > > In some theoretical future where the toolstack better understands MSRs > and (non)migratable VMs (which is the QubesOS usecase), then it would in > principle be fine to construct a VM which can see the host TSC_FREQ_SEL > and PSTATE* values. > > Preferably with an adjusted comment, Reviewed-by: Andrew Cooper > <andrew.cooper3@citrix.com> Sorry - I meant to be clearer here. I'd suggest just deleting the comment and leaving an unconditional return of 0 (which will become conditional when we wire up CPUID Faulting). MSR_HWCR *is* an architectural MSR on any 64bit AMD system, so shouldn't fault. ~Andrew
On 12.09.2023 18:23, Roger Pau Monne wrote: > OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is > set, and will also attempt to unconditionally access HWCR if the TSC is > reported as Invariant. > > The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a > (bogus) warning message, but doing so at the cost of OpenBSD not booting is not > a suitable solution. Why is the warning bogus? The PPR doesn't even state what the bit being clear means; it's a r/o one. On respective families it cannot possibly be correct to expose it clear. > In order to fix expose an empty HWCR. > > Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > --- > Not sure whether we want to expose something when is_cpufreq_controller() is > true, seeing as there's a special wrmsr handler for the same MSR in that case. > Likely should be done for PV only, but also likely quite bogus. Well, keying to is_cpufreq_controller() is indeed questionable, but is there any reason to not minimally expose the bit correctly when a domain cannot migrate? Jan
On 12.09.2023 18:35, Andrew Cooper wrote: > On 12/09/2023 5:23 pm, Roger Pau Monne wrote: >> --- a/xen/arch/x86/msr.c >> +++ b/xen/arch/x86/msr.c >> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) >> case MSR_K8_HWCR: >> if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) >> goto gp_fault; >> - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 >> - ? K8_HWCR_TSC_FREQ_SEL : 0; >> + /* >> + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as >> + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to >> + * also poke at PSTATE0. >> + */ > > While this is true, the justification for removing this is because > TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. > > Also because it's addition without writing into the migration stream was > bogus irrespective of the specifics of the bit. > > I'm still of the opinion that it's buggy for OpenBSD to be looking at > model specific bits when virtualised, but given my latest reading of the > AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it > can see TSC_FREQ_SEL. I'm afraid I'm still at a loss seeing what wording in which PPR makes you see a connection. If there was a connection, I'd like to ask that we at least properly consider exposing PSTATE0 (or if necessary all PSTATEn) as well, with zero value (i.e. in particular with the valid bit clear), rather than exposing a r/o bit with the wrong value, upsetting Linux. Jan
On Tue, Sep 12, 2023 at 05:35:15PM +0100, Andrew Cooper wrote: > On 12/09/2023 5:23 pm, Roger Pau Monne wrote: > > OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is > > set, and will also attempt to unconditionally access HWCR if the TSC is > > reported as Invariant. > > > > The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a > > (bogus) warning message, but doing so at the cost of OpenBSD not booting is not > > a suitable solution. > > > > In order to fix expose an empty HWCR. > > At first I was thinking a straight up revert, but AMD's CPUID Faulting > is an architectural bit in here so it's worth keeping the register around. A straight up revert won't work, because (as you notice below) HWCR is architectural, so accesses must not fault. > > > > Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > Not sure whether we want to expose something when is_cpufreq_controller() is > > true, seeing as there's a special wrmsr handler for the same MSR in that case. > > Likely should be done for PV only, but also likely quite bogus. > > > > Missing reported by as the issue came from the QubesOS tracker. > > Well - we can at least have a: > > Link: https://github.com/QubesOS/qubes-issues/issues/8502 Sure. > in the commit message, and it's probably worth asking Solène / Marek > (both CC'd) if they want a Reported-by tag. I'm happy to add a Reported-by tag, just didn't have an email to use. > > --- > > xen/arch/x86/msr.c | 8 ++++++-- > > 1 file changed, 6 insertions(+), 2 deletions(-) > > > > diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c > > index 3f0450259cdf..964d500ff8a1 100644 > > --- a/xen/arch/x86/msr.c > > +++ b/xen/arch/x86/msr.c > > @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) > > case MSR_K8_HWCR: > > if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > > goto gp_fault; > > - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 > > - ? K8_HWCR_TSC_FREQ_SEL : 0; > > + /* > > + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as > > + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to > > + * also poke at PSTATE0. > > + */ > > While this is true, the justification for removing this is because > TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. > > Also because it's addition without writing into the migration stream was > bogus irrespective of the specifics of the bit. > > I'm still of the opinion that it's buggy for OpenBSD to be looking at > model specific bits when virtualised, Well, I think we can argue that an OS is free to ignore the CPUID HV bit and still boot on Xen (even if that leads to non-ideal decisions). > but given my latest reading of the > AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it > can see TSC_FREQ_SEL. Hm, there's no written down note that TSC_FREQ_SEL implies PSTATE0 to be available (and PSTATE0 is not an architectural MSR), but I can see how a guest can expect to fetch the P0 frequency if it sees TSC_FREQ_SEL. It might have been more fail safe to check for PSTATE_LIMIT not faulting before attempting to access PSTATE0. > In some theoretical future where the toolstack better understands MSRs > and (non)migratable VMs (which is the QubesOS usecase), then it would in > principle be fine to construct a VM which can see the host TSC_FREQ_SEL > and PSTATE* values. > > Preferably with an adjusted comment, Reviewed-by: Andrew Cooper > <andrew.cooper3@citrix.com> Thanks, will reply to other comments before taking the RB and resending. Roger.
On Tue, Sep 12, 2023 at 05:36:53PM +0100, Andrew Cooper wrote: > On 12/09/2023 5:35 pm, Andrew Cooper wrote: > > On 12/09/2023 5:23 pm, Roger Pau Monne wrote: > >> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is > >> set, and will also attempt to unconditionally access HWCR if the TSC is > >> reported as Invariant. > >> > >> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a > >> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not > >> a suitable solution. > >> > >> In order to fix expose an empty HWCR. > > At first I was thinking a straight up revert, but AMD's CPUID Faulting > > is an architectural bit in here so it's worth keeping the register around. > > > >> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') > >> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > >> --- > >> Not sure whether we want to expose something when is_cpufreq_controller() is > >> true, seeing as there's a special wrmsr handler for the same MSR in that case. > >> Likely should be done for PV only, but also likely quite bogus. > >> > >> Missing reported by as the issue came from the QubesOS tracker. > > Well - we can at least have a: > > > > Link: https://github.com/QubesOS/qubes-issues/issues/8502 > > > > in the commit message, and it's probably worth asking Solène / Marek > > (both CC'd) if they want a Reported-by tag. > > > >> --- > >> xen/arch/x86/msr.c | 8 ++++++-- > >> 1 file changed, 6 insertions(+), 2 deletions(-) > >> > >> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c > >> index 3f0450259cdf..964d500ff8a1 100644 > >> --- a/xen/arch/x86/msr.c > >> +++ b/xen/arch/x86/msr.c > >> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) > >> case MSR_K8_HWCR: > >> if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > >> goto gp_fault; > >> - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 > >> - ? K8_HWCR_TSC_FREQ_SEL : 0; > >> + /* > >> + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as > >> + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to > >> + * also poke at PSTATE0. > >> + */ > > While this is true, the justification for removing this is because > > TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. > > > > Also because it's addition without writing into the migration stream was > > bogus irrespective of the specifics of the bit. > > > > I'm still of the opinion that it's buggy for OpenBSD to be looking at > > model specific bits when virtualised, but given my latest reading of the > > AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it > > can see TSC_FREQ_SEL. > > > > In some theoretical future where the toolstack better understands MSRs > > and (non)migratable VMs (which is the QubesOS usecase), then it would in > > principle be fine to construct a VM which can see the host TSC_FREQ_SEL > > and PSTATE* values. > > > > Preferably with an adjusted comment, Reviewed-by: Andrew Cooper > > <andrew.cooper3@citrix.com> > > Sorry - I meant to be clearer here. I'd suggest just deleting the > comment and leaving an unconditional return of 0 (which will become > conditional when we wire up CPUID Faulting). > > MSR_HWCR *is* an architectural MSR on any 64bit AMD system, so shouldn't > fault. Hm, I think it's worth to at least keep a note that if TSC_FREQ_SEL is exposed PSTATE0 must also be exposed to prevent OpenBSD 7.3 from panicking. Thanks, Roger.
On Wed, Sep 13, 2023 at 08:14:46AM +0200, Jan Beulich wrote: > On 12.09.2023 18:23, Roger Pau Monne wrote: > > OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is > > set, and will also attempt to unconditionally access HWCR if the TSC is > > reported as Invariant. > > > > The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a > > (bogus) warning message, but doing so at the cost of OpenBSD not booting is not > > a suitable solution. > > Why is the warning bogus? The PPR doesn't even state what the bit being > clear means; it's a r/o one. On respective families it cannot possibly > be correct to expose it clear. There are other bits in the same MSR that only state the meaning of one value and are not r/o, so my take is that being clear means "The TSC doesn't increment at the P0 frequency". Since it's model specific, it should be possible for some models to have the bit clear. > > In order to fix expose an empty HWCR. > > > > Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') > > Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > > --- > > Not sure whether we want to expose something when is_cpufreq_controller() is > > true, seeing as there's a special wrmsr handler for the same MSR in that case. > > Likely should be done for PV only, but also likely quite bogus. > > Well, keying to is_cpufreq_controller() is indeed questionable, but is > there any reason to not minimally expose the bit correctly when a > domain cannot migrate? We would then also need to expose PSTATE0 at least to make OpenBSD happy (and any otehr guest that makes the connection between TscFreqSel and getting the P0 frequency from PSTATE0. Thanks, Roger.
On 13.09.2023 10:20, Roger Pau Monné wrote: > On Wed, Sep 13, 2023 at 08:14:46AM +0200, Jan Beulich wrote: >> On 12.09.2023 18:23, Roger Pau Monne wrote: >>> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is >>> set, and will also attempt to unconditionally access HWCR if the TSC is >>> reported as Invariant. >>> >>> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a >>> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not >>> a suitable solution. >> >> Why is the warning bogus? The PPR doesn't even state what the bit being >> clear means; it's a r/o one. On respective families it cannot possibly >> be correct to expose it clear. > > There are other bits in the same MSR that only state the meaning of > one value and are not r/o, so my take is that being clear means "The > TSC doesn't increment at the P0 frequency". > > Since it's model specific, it should be possible for some models to > have the bit clear. The code that's in there right now already has a family >= 0x10 check. The Fam10 BKDG says (among other things) "BIOS should program this bit to 1." That, imo, doesn't leave much room for the bit being clear once an OS (or hypervisor) gains control from firmware. >>> In order to fix expose an empty HWCR. >>> >>> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>> --- >>> Not sure whether we want to expose something when is_cpufreq_controller() is >>> true, seeing as there's a special wrmsr handler for the same MSR in that case. >>> Likely should be done for PV only, but also likely quite bogus. >> >> Well, keying to is_cpufreq_controller() is indeed questionable, but is >> there any reason to not minimally expose the bit correctly when a >> domain cannot migrate? > > We would then also need to expose PSTATE0 at least to make OpenBSD > happy (and any otehr guest that makes the connection between > TscFreqSel and getting the P0 frequency from PSTATE0. If any such OSes can be used as Dom0, yes. And as said before, I view exposing PSTATE0 (with zero value) as a viable alternative to your partial revert anyway. What varies across families is how many PSTATEn there are, but at least starting from Fam10 PSTATE0 looks to always be there, with the top bit indicating validity of the other defined bits. Jan
On Wed, Sep 13, 2023 at 08:18:57AM +0200, Jan Beulich wrote: > On 12.09.2023 18:35, Andrew Cooper wrote: > > On 12/09/2023 5:23 pm, Roger Pau Monne wrote: > >> --- a/xen/arch/x86/msr.c > >> +++ b/xen/arch/x86/msr.c > >> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) > >> case MSR_K8_HWCR: > >> if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > >> goto gp_fault; > >> - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 > >> - ? K8_HWCR_TSC_FREQ_SEL : 0; > >> + /* > >> + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as > >> + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to > >> + * also poke at PSTATE0. > >> + */ > > > > While this is true, the justification for removing this is because > > TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. > > > > Also because it's addition without writing into the migration stream was > > bogus irrespective of the specifics of the bit. > > > > I'm still of the opinion that it's buggy for OpenBSD to be looking at > > model specific bits when virtualised, but given my latest reading of the > > AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it > > can see TSC_FREQ_SEL. > > I'm afraid I'm still at a loss seeing what wording in which PPR makes you > see a connection. If there was a connection, I'd like to ask that we at > least properly consider exposing PSTATE0 (or if necessary all PSTATEn) as > well, with zero value (i.e. in particular with the valid bit clear), > rather than exposing a r/o bit with the wrong value, upsetting Linux. But PSTATEn is also non-architectural, so the bit meaning could in theory change between models. There seems to be no bit anywhere that signals whether PSTATEn is available, as it's my reading that you can have PSTATEn without the architectural PSTATE_{LIMIT,CTRL,STATUS} MSRs that are signaled by HW_PSTATE CPUID bit. Thanks, Roger.
On Wed, Sep 13, 2023 at 10:35:17AM +0200, Jan Beulich wrote: > On 13.09.2023 10:20, Roger Pau Monné wrote: > > On Wed, Sep 13, 2023 at 08:14:46AM +0200, Jan Beulich wrote: > >> On 12.09.2023 18:23, Roger Pau Monne wrote: > >>> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is > >>> set, and will also attempt to unconditionally access HWCR if the TSC is > >>> reported as Invariant. > >>> > >>> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a > >>> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not > >>> a suitable solution. > >> > >> Why is the warning bogus? The PPR doesn't even state what the bit being > >> clear means; it's a r/o one. On respective families it cannot possibly > >> be correct to expose it clear. > > > > There are other bits in the same MSR that only state the meaning of > > one value and are not r/o, so my take is that being clear means "The > > TSC doesn't increment at the P0 frequency". > > > > Since it's model specific, it should be possible for some models to > > have the bit clear. > > The code that's in there right now already has a family >= 0x10 check. > The Fam10 BKDG says (among other things) "BIOS should program this bit > to 1." That, imo, doesn't leave much room for the bit being clear once > an OS (or hypervisor) gains control from firmware. > > >>> In order to fix expose an empty HWCR. > >>> > >>> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') > >>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> > >>> --- > >>> Not sure whether we want to expose something when is_cpufreq_controller() is > >>> true, seeing as there's a special wrmsr handler for the same MSR in that case. > >>> Likely should be done for PV only, but also likely quite bogus. > >> > >> Well, keying to is_cpufreq_controller() is indeed questionable, but is > >> there any reason to not minimally expose the bit correctly when a > >> domain cannot migrate? > > > > We would then also need to expose PSTATE0 at least to make OpenBSD > > happy (and any otehr guest that makes the connection between > > TscFreqSel and getting the P0 frequency from PSTATE0. > > If any such OSes can be used as Dom0, yes. OpenBSD can't be used as dom0, but QubesOS does use the nomigrate flag for domUs. > And as said before, I view > exposing PSTATE0 (with zero value) as a viable alternative to your > partial revert anyway. What varies across families is how many PSTATEn > there are, but at least starting from Fam10 PSTATE0 looks to always be > there, with the top bit indicating validity of the other defined bits. I did consider this, but it seemed to just dig us deeper into exposing non-architectural MSRs, which in the long run is more likely to be troublesome if newer models change the meaning of bits in PSTATEn. Thanks, Roger.
On 13.09.2023 11:16, Roger Pau Monné wrote: > On Wed, Sep 13, 2023 at 10:35:17AM +0200, Jan Beulich wrote: >> On 13.09.2023 10:20, Roger Pau Monné wrote: >>> On Wed, Sep 13, 2023 at 08:14:46AM +0200, Jan Beulich wrote: >>>> On 12.09.2023 18:23, Roger Pau Monne wrote: >>>>> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is >>>>> set, and will also attempt to unconditionally access HWCR if the TSC is >>>>> reported as Invariant. >>>>> >>>>> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a >>>>> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not >>>>> a suitable solution. >>>> >>>> Why is the warning bogus? The PPR doesn't even state what the bit being >>>> clear means; it's a r/o one. On respective families it cannot possibly >>>> be correct to expose it clear. >>> >>> There are other bits in the same MSR that only state the meaning of >>> one value and are not r/o, so my take is that being clear means "The >>> TSC doesn't increment at the P0 frequency". >>> >>> Since it's model specific, it should be possible for some models to >>> have the bit clear. >> >> The code that's in there right now already has a family >= 0x10 check. >> The Fam10 BKDG says (among other things) "BIOS should program this bit >> to 1." That, imo, doesn't leave much room for the bit being clear once >> an OS (or hypervisor) gains control from firmware. >> >>>>> In order to fix expose an empty HWCR. >>>>> >>>>> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') >>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>>>> --- >>>>> Not sure whether we want to expose something when is_cpufreq_controller() is >>>>> true, seeing as there's a special wrmsr handler for the same MSR in that case. >>>>> Likely should be done for PV only, but also likely quite bogus. >>>> >>>> Well, keying to is_cpufreq_controller() is indeed questionable, but is >>>> there any reason to not minimally expose the bit correctly when a >>>> domain cannot migrate? >>> >>> We would then also need to expose PSTATE0 at least to make OpenBSD >>> happy (and any otehr guest that makes the connection between >>> TscFreqSel and getting the P0 frequency from PSTATE0. >> >> If any such OSes can be used as Dom0, yes. > > OpenBSD can't be used as dom0, but QubesOS does use the nomigrate flag > for domUs. > >> And as said before, I view >> exposing PSTATE0 (with zero value) as a viable alternative to your >> partial revert anyway. What varies across families is how many PSTATEn >> there are, but at least starting from Fam10 PSTATE0 looks to always be >> there, with the top bit indicating validity of the other defined bits. > > I did consider this, but it seemed to just dig us deeper into exposing > non-architectural MSRs, which in the long run is more likely to be > troublesome if newer models change the meaning of bits in PSTATEn. Not sure about "deeper" - we're discussing to expose a non-architectural bit in HWCR with the wrong value vs exposing a non-architectural MSR where we'd rely on only the top bit retaining its meaning going forward. Jan
On 13/09/2023 9:08 am, Roger Pau Monné wrote: > On Tue, Sep 12, 2023 at 05:36:53PM +0100, Andrew Cooper wrote: >> On 12/09/2023 5:35 pm, Andrew Cooper wrote: >>> On 12/09/2023 5:23 pm, Roger Pau Monne wrote: >>>> OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is >>>> set, and will also attempt to unconditionally access HWCR if the TSC is >>>> reported as Invariant. >>>> >>>> The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a >>>> (bogus) warning message, but doing so at the cost of OpenBSD not booting is not >>>> a suitable solution. >>>> >>>> In order to fix expose an empty HWCR. >>> At first I was thinking a straight up revert, but AMD's CPUID Faulting >>> is an architectural bit in here so it's worth keeping the register around. >>> >>>> Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') >>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> >>>> --- >>>> Not sure whether we want to expose something when is_cpufreq_controller() is >>>> true, seeing as there's a special wrmsr handler for the same MSR in that case. >>>> Likely should be done for PV only, but also likely quite bogus. >>>> >>>> Missing reported by as the issue came from the QubesOS tracker. >>> Well - we can at least have a: >>> >>> Link: https://github.com/QubesOS/qubes-issues/issues/8502 >>> >>> in the commit message, and it's probably worth asking Solène / Marek >>> (both CC'd) if they want a Reported-by tag. >>> >>>> --- >>>> xen/arch/x86/msr.c | 8 ++++++-- >>>> 1 file changed, 6 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c >>>> index 3f0450259cdf..964d500ff8a1 100644 >>>> --- a/xen/arch/x86/msr.c >>>> +++ b/xen/arch/x86/msr.c >>>> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) >>>> case MSR_K8_HWCR: >>>> if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) >>>> goto gp_fault; >>>> - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 >>>> - ? K8_HWCR_TSC_FREQ_SEL : 0; >>>> + /* >>>> + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as >>>> + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to >>>> + * also poke at PSTATE0. >>>> + */ >>> While this is true, the justification for removing this is because >>> TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. >>> >>> Also because it's addition without writing into the migration stream was >>> bogus irrespective of the specifics of the bit. >>> >>> I'm still of the opinion that it's buggy for OpenBSD to be looking at >>> model specific bits when virtualised, but given my latest reading of the >>> AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it >>> can see TSC_FREQ_SEL. >>> >>> In some theoretical future where the toolstack better understands MSRs >>> and (non)migratable VMs (which is the QubesOS usecase), then it would in >>> principle be fine to construct a VM which can see the host TSC_FREQ_SEL >>> and PSTATE* values. >>> >>> Preferably with an adjusted comment, Reviewed-by: Andrew Cooper >>> <andrew.cooper3@citrix.com> >> Sorry - I meant to be clearer here. I'd suggest just deleting the >> comment and leaving an unconditional return of 0 (which will become >> conditional when we wire up CPUID Faulting). >> >> MSR_HWCR *is* an architectural MSR on any 64bit AMD system, so shouldn't >> fault. > Hm, I think it's worth to at least keep a note that if TSC_FREQ_SEL is > exposed PSTATE0 must also be exposed to prevent OpenBSD 7.3 from > panicking. But there's nothing OpenBSD 7.3 specific about it. Any software which sees this bit is permitted (expected even) to edit the pstate registers. You know why it's called frequency select? Because firmware is supposed to program the systemwide frequency/voltage to the user's request for whether the system wants to run cooler, or be overclocked. Putting OpenBSD 7.3 in the commit message is absolutely relevant to why we're making this change now, but it's not relevant to why we have concluded that TSC_FREQ_SEL is bogus to expose to guests. ~Andrew
On 13/09/2023 8:50 am, Roger Pau Monné wrote: > On Tue, Sep 12, 2023 at 05:35:15PM +0100, Andrew Cooper wrote: >> On 12/09/2023 5:23 pm, Roger Pau Monne wrote: >>> --- >>> xen/arch/x86/msr.c | 8 ++++++-- >>> 1 file changed, 6 insertions(+), 2 deletions(-) >>> >>> diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c >>> index 3f0450259cdf..964d500ff8a1 100644 >>> --- a/xen/arch/x86/msr.c >>> +++ b/xen/arch/x86/msr.c >>> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) >>> case MSR_K8_HWCR: >>> if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) >>> goto gp_fault; >>> - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 >>> - ? K8_HWCR_TSC_FREQ_SEL : 0; >>> + /* >>> + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as >>> + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to >>> + * also poke at PSTATE0. >>> + */ >> While this is true, the justification for removing this is because >> TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. >> >> Also because it's addition without writing into the migration stream was >> bogus irrespective of the specifics of the bit. >> >> I'm still of the opinion that it's buggy for OpenBSD to be looking at >> model specific bits when virtualised, > Well, I think we can argue that an OS is free to ignore the CPUID HV > bit and still boot on Xen (even if that leads to non-ideal decisions). An OS which keeps itself to architectural details that we do our very best to be correct with, ought to function even if it ignores the HV bit. But we're (deliberately) not doing model-specific-accurate emulations of a specific platform. An OS which ignores details about the environment it is operating in gets to keep the faults/malfunctions when it does something illegal. >> but given my latest reading of the >> AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it >> can see TSC_FREQ_SEL. > Hm, there's no written down note that TSC_FREQ_SEL implies PSTATE0 to > be available (and PSTATE0 is not an architectural MSR), but I can see > how a guest can expect to fetch the P0 frequency if it sees > TSC_FREQ_SEL. The PPR is a reference of mostly autogenerated details and misc notes, put together in a non- hand-write way, unlike the older BKWGs. Lots of the information elided from public and partner-NDA versions is "see TICKET/LINK for rational" type comments. It is not a spec - it is a reference (the clue is even in the name) aimed at people already familiar with the area. Do not fall into the trap of thinking it it can be read as a spec. ~Andrew
On 13/09/2023 7:18 am, Jan Beulich wrote: > On 12.09.2023 18:35, Andrew Cooper wrote: >> On 12/09/2023 5:23 pm, Roger Pau Monne wrote: >>> --- a/xen/arch/x86/msr.c >>> +++ b/xen/arch/x86/msr.c >>> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) >>> case MSR_K8_HWCR: >>> if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) >>> goto gp_fault; >>> - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 >>> - ? K8_HWCR_TSC_FREQ_SEL : 0; >>> + /* >>> + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as >>> + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to >>> + * also poke at PSTATE0. >>> + */ >> While this is true, the justification for removing this is because >> TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. >> >> Also because it's addition without writing into the migration stream was >> bogus irrespective of the specifics of the bit. >> >> I'm still of the opinion that it's buggy for OpenBSD to be looking at >> model specific bits when virtualised, but given my latest reading of the >> AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it >> can see TSC_FREQ_SEL. > I'm afraid I'm still at a loss seeing what wording in which PPR makes you > see a connection. If there was a connection, I'd like to ask that we at > least properly consider exposing PSTATE0 (or if necessary all PSTATEn) as > well, with zero value (i.e. in particular with the valid bit clear), > rather than exposing a r/o bit with the wrong value, upsetting Linux. This mess is caused by the erroneous advertising of a model specific bit, and there's no way in hell that giving the guest even more model specific data is going to fix it. The PSTATE MSRs are entirely model specific, fully read/write, and the Enable bit is not an enable bit; its a "not valid yet" bit that firmware is required to adjust to be consistent across the coherency fabric. Linux is simply wrong with it's printk() under virt, and wants adjusting. ~Andrew
On 13.09.2023 12:50, Andrew Cooper wrote: > On 13/09/2023 8:50 am, Roger Pau Monné wrote: >> Hm, there's no written down note that TSC_FREQ_SEL implies PSTATE0 to >> be available (and PSTATE0 is not an architectural MSR), but I can see >> how a guest can expect to fetch the P0 frequency if it sees >> TSC_FREQ_SEL. > > The PPR is a reference of mostly autogenerated details and misc notes, > put together in a non- hand-write way, unlike the older BKWGs. > > Lots of the information elided from public and partner-NDA versions is > "see TICKET/LINK for rational" type comments. > > It is not a spec - it is a reference (the clue is even in the name) > aimed at people already familiar with the area. Do not fall into the > trap of thinking it it can be read as a spec. But then where is it written down that the bit set implies the PSTATEn MSRs to exist? Jan
On 13.09.2023 13:02, Andrew Cooper wrote: > On 13/09/2023 7:18 am, Jan Beulich wrote: >> On 12.09.2023 18:35, Andrew Cooper wrote: >>> On 12/09/2023 5:23 pm, Roger Pau Monne wrote: >>>> --- a/xen/arch/x86/msr.c >>>> +++ b/xen/arch/x86/msr.c >>>> @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) >>>> case MSR_K8_HWCR: >>>> if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) >>>> goto gp_fault; >>>> - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 >>>> - ? K8_HWCR_TSC_FREQ_SEL : 0; >>>> + /* >>>> + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as >>>> + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to >>>> + * also poke at PSTATE0. >>>> + */ >>> While this is true, the justification for removing this is because >>> TSC_FREQ_SEL is a model specific bit, not an architectural bit in HWCR. >>> >>> Also because it's addition without writing into the migration stream was >>> bogus irrespective of the specifics of the bit. >>> >>> I'm still of the opinion that it's buggy for OpenBSD to be looking at >>> model specific bits when virtualised, but given my latest reading of the >>> AMD manuals, I think OpenBSD *is* well behaved looking at PSTATE0 if it >>> can see TSC_FREQ_SEL. >> I'm afraid I'm still at a loss seeing what wording in which PPR makes you >> see a connection. If there was a connection, I'd like to ask that we at >> least properly consider exposing PSTATE0 (or if necessary all PSTATEn) as >> well, with zero value (i.e. in particular with the valid bit clear), >> rather than exposing a r/o bit with the wrong value, upsetting Linux. > > This mess is caused by the erroneous advertising of a model specific > bit, and there's no way in hell that giving the guest even more model > specific data is going to fix it. > > The PSTATE MSRs are entirely model specific, fully read/write, and the > Enable bit is not an enable bit; its a "not valid yet" bit that firmware > is required to adjust to be consistent across the coherency fabric. > > Linux is simply wrong with it's printk() under virt, and wants adjusting. Assuming Roger agrees with this statement, then I think this is another item wanting mentioning in the description. I then still wouldn't ack the change, but I also wouldn't object to it anymore. Jan
On Wed, Sep 13 2023 at 12:02, Andrew Cooper wrote: > The PSTATE MSRs are entirely model specific, fully read/write, and the > Enable bit is not an enable bit; its a "not valid yet" bit that firmware > is required to adjust to be consistent across the coherency fabric. > > Linux is simply wrong with it's printk() under virt, and wants adjusting. No objections from my side. Thanks, tglx
diff --git a/xen/arch/x86/msr.c b/xen/arch/x86/msr.c index 3f0450259cdf..964d500ff8a1 100644 --- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -240,8 +240,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t msr, uint64_t *val) case MSR_K8_HWCR: if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) goto gp_fault; - *val = get_cpu_family(cp->basic.raw_fms, NULL, NULL) >= 0x10 - ? K8_HWCR_TSC_FREQ_SEL : 0; + /* + * OpenBSD 7.3 accesses HWCR unconditionally if the TSC is reported as + * Invariant. Do not set TSC_FREQ_SEL as that would trigger OpenBSD to + * also poke at PSTATE0. + */ + *val = 0; break; case MSR_VIRT_SPEC_CTRL:
OpenBSD will attempt to unconditionally access PSTATE0 if HWCR.TscFreqSel is set, and will also attempt to unconditionally access HWCR if the TSC is reported as Invariant. The reasoning for exposing HWCR.TscFreqSel was to avoid Linux from printing a (bogus) warning message, but doing so at the cost of OpenBSD not booting is not a suitable solution. In order to fix expose an empty HWCR. Fixes: 14b95b3b8546 ('x86/AMD: expose HWCR.TscFreqSel to guests') Signed-off-by: Roger Pau Monné <roger.pau@citrix.com> --- Not sure whether we want to expose something when is_cpufreq_controller() is true, seeing as there's a special wrmsr handler for the same MSR in that case. Likely should be done for PV only, but also likely quite bogus. Missing reported by as the issue came from the QubesOS tracker. --- xen/arch/x86/msr.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)