Message ID | c91b190a-aaa1-d3b8-10eb-d8da7ad1f834@suse.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86: guest MSR access handling tweaks | expand |
On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote: > Linux has been warning ("firmware bug") about this bit being clear for a > long time. While writable in older hardware it has been readonly on more > than just most recent hardware. For simplicitly report it always set (if > anything we may want to log the issue ourselves if it turns out to be > clear on older hardware). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> One question below. > --- > v2: New. > --- > There are likely more bits worthwhile to expose, but for about every one > of them there would be the risk of a lengthy discussion, as there are > clear downsides to exposing such information, the more that it would be > tbd whether the hardware values should be surfaced, and if so what > should happen when the guest gets migrated. > > The main risk with making the read not fault here is that guests might > imply they can also write this MSR then. > > --- a/xen/arch/x86/msr.c > +++ b/xen/arch/x86/msr.c > @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t > *val = msrs->tsc_aux; > break; > > + case MSR_K8_HWCR: > + if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > + goto gp_fault; > + *val = K8_HWCR_TSC_FREQ_SEL; I've been only able to find information about this MSR up to family 10h, but I think in theory Xen might also run on family 0Fh, do you know if the MSR is present there, and the bit has the same meaning? > + break; > + > case MSR_AMD64_DE_CFG: > if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > goto gp_fault; > --- a/xen/include/asm-x86/msr-index.h > +++ b/xen/include/asm-x86/msr-index.h > @@ -287,6 +287,8 @@ > > #define MSR_K7_HWCR 0xc0010015 We could likely drop the K7 define here, as Xen won't be able to run on K7 hardware anymore AFAICT. Thanks, Roger.
On 08.03.2021 12:37, Roger Pau Monné wrote: > On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote: >> Linux has been warning ("firmware bug") about this bit being clear for a >> long time. While writable in older hardware it has been readonly on more >> than just most recent hardware. For simplicitly report it always set (if >> anything we may want to log the issue ourselves if it turns out to be >> clear on older hardware). >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Thanks. > One question below. > >> --- >> v2: New. >> --- >> There are likely more bits worthwhile to expose, but for about every one >> of them there would be the risk of a lengthy discussion, as there are >> clear downsides to exposing such information, the more that it would be >> tbd whether the hardware values should be surfaced, and if so what >> should happen when the guest gets migrated. >> >> The main risk with making the read not fault here is that guests might >> imply they can also write this MSR then. >> >> --- a/xen/arch/x86/msr.c >> +++ b/xen/arch/x86/msr.c >> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t >> *val = msrs->tsc_aux; >> break; >> >> + case MSR_K8_HWCR: >> + if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) >> + goto gp_fault; >> + *val = K8_HWCR_TSC_FREQ_SEL; > > I've been only able to find information about this MSR up to family > 10h, but I think in theory Xen might also run on family 0Fh, do you > know if the MSR is present there, and the bit has the same meaning? From its name (and its K7 alternative name) it's clear the register had been there at that point. And indeed the bit has a different meaning there (its the bottom bit of a 6-bit START_FID field if the BKDG I'm looking at can be trusted. Since I don't think it matters much whether we expose a value of 0x00 or a value of 0x01 there, and since we likely don't want to make #GP raising dependent upon family when we don't _really_ need to, I would want to propose that the value used is good enough uniformly. >> --- a/xen/include/asm-x86/msr-index.h >> +++ b/xen/include/asm-x86/msr-index.h >> @@ -287,6 +287,8 @@ >> >> #define MSR_K7_HWCR 0xc0010015 > > We could likely drop the K7 define here, as Xen won't be able to run > on K7 hardware anymore AFAICT. Indeed, but not at this point in time. Jan
On Mon, Mar 08, 2021 at 12:47:44PM +0100, Jan Beulich wrote: > On 08.03.2021 12:37, Roger Pau Monné wrote: > > On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote: > >> Linux has been warning ("firmware bug") about this bit being clear for a > >> long time. While writable in older hardware it has been readonly on more > >> than just most recent hardware. For simplicitly report it always set (if > >> anything we may want to log the issue ourselves if it turns out to be > >> clear on older hardware). > >> > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > > Thanks. > > > One question below. > > > >> --- > >> v2: New. > >> --- > >> There are likely more bits worthwhile to expose, but for about every one > >> of them there would be the risk of a lengthy discussion, as there are > >> clear downsides to exposing such information, the more that it would be > >> tbd whether the hardware values should be surfaced, and if so what > >> should happen when the guest gets migrated. > >> > >> The main risk with making the read not fault here is that guests might > >> imply they can also write this MSR then. > >> > >> --- a/xen/arch/x86/msr.c > >> +++ b/xen/arch/x86/msr.c > >> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t > >> *val = msrs->tsc_aux; > >> break; > >> > >> + case MSR_K8_HWCR: > >> + if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) > >> + goto gp_fault; > >> + *val = K8_HWCR_TSC_FREQ_SEL; > > > > I've been only able to find information about this MSR up to family > > 10h, but I think in theory Xen might also run on family 0Fh, do you > > know if the MSR is present there, and the bit has the same meaning? > > From its name (and its K7 alternative name) it's clear the register > had been there at that point. And indeed the bit has a different > meaning there (its the bottom bit of a 6-bit START_FID field if the > BKDG I'm looking at can be trusted. OK, I cannot seem to find the BKDG for family 0Fh. The oldest BKDG I can find is for Family 10h [0]. > Since I don't think it matters > much whether we expose a value of 0x00 or a value of 0x01 there, > and since we likely don't want to make #GP raising dependent upon > family when we don't _really_ need to, I would want to propose that > the value used is good enough uniformly. I would be fine with setting it to 0 if Fam < 10h if you think that's acceptable. I think the chances of someone running Xen >= 4.15 on such old hardware are quite dim. Thanks, Roger. [0] https://developer.amd.com/resources/developer-guides-manuals/
On 05/03/2021 09:50, Jan Beulich wrote: > Linux has been warning ("firmware bug") about this bit being clear for a > long time. While writable in older hardware it has been readonly on more > than just most recent hardware. For simplicitly report it always set (if > anything we may want to log the issue ourselves if it turns out to be > clear on older hardware). > > Signed-off-by: Jan Beulich <jbeulich@suse.com> I realise Linux is complaining, but simply setting the bit isn't a fix. This needs corresponding updates in the ACPI tables, as well as Pstate MSRs, or Linux will derive a false relationship between the TSC rate and wallclock. ~Andrew
On Mon, Mar 08, 2021 at 12:41:26PM +0000, Andrew Cooper wrote: > On 05/03/2021 09:50, Jan Beulich wrote: > > Linux has been warning ("firmware bug") about this bit being clear for a > > long time. While writable in older hardware it has been readonly on more > > than just most recent hardware. For simplicitly report it always set (if > > anything we may want to log the issue ourselves if it turns out to be > > clear on older hardware). > > > > Signed-off-by: Jan Beulich <jbeulich@suse.com> > > I realise Linux is complaining, but simply setting the bit isn't a fix. > > This needs corresponding updates in the ACPI tables, as well as Pstate > MSRs, or Linux will derive a false relationship between the TSC rate and > wallclock. Is there any description of those relations? I don't seem to find any other MSR referencing the TscFreqSel bit in HWCR on the AMD Open-Source Register Reference, but I might be looking at the wrong place. Thanks, Roger.
On 08.03.2021 13:41, Andrew Cooper wrote: > On 05/03/2021 09:50, Jan Beulich wrote: >> Linux has been warning ("firmware bug") about this bit being clear for a >> long time. While writable in older hardware it has been readonly on more >> than just most recent hardware. For simplicitly report it always set (if >> anything we may want to log the issue ourselves if it turns out to be >> clear on older hardware). >> >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > I realise Linux is complaining, but simply setting the bit isn't a fix. > > This needs corresponding updates in the ACPI tables, as well as Pstate > MSRs, or Linux will derive a false relationship between the TSC rate and > wallclock. I guess I don't follow: AMD's doc is very clear: BIOSes ought to set the bit. It not being set is more likely a mistake than an indication of other pieces (MSRs, ACPI tables) reflecting this unintended state. Plus isn't what you say true also if Linux sees the bit wrongly clear (which would be the case prior to this patch)? Are you suggesting we should revert behavior here all the way to letting the hardware bit shine through again (for Dom0; for DomU neither other MSRs nor ACPI tables are possibly aware of this bit's state)? Jan
On 08.03.2021 13:29, Roger Pau Monné wrote: > On Mon, Mar 08, 2021 at 12:47:44PM +0100, Jan Beulich wrote: >> On 08.03.2021 12:37, Roger Pau Monné wrote: >>> On Fri, Mar 05, 2021 at 10:50:54AM +0100, Jan Beulich wrote: >>>> --- a/xen/arch/x86/msr.c >>>> +++ b/xen/arch/x86/msr.c >>>> @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t >>>> *val = msrs->tsc_aux; >>>> break; >>>> >>>> + case MSR_K8_HWCR: >>>> + if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) >>>> + goto gp_fault; >>>> + *val = K8_HWCR_TSC_FREQ_SEL; >>> >>> I've been only able to find information about this MSR up to family >>> 10h, but I think in theory Xen might also run on family 0Fh, do you >>> know if the MSR is present there, and the bit has the same meaning? >> >> From its name (and its K7 alternative name) it's clear the register >> had been there at that point. And indeed the bit has a different >> meaning there (its the bottom bit of a 6-bit START_FID field if the >> BKDG I'm looking at can be trusted. > > OK, I cannot seem to find the BKDG for family 0Fh. The oldest BKDG I > can find is for Family 10h [0]. > >> Since I don't think it matters >> much whether we expose a value of 0x00 or a value of 0x01 there, >> and since we likely don't want to make #GP raising dependent upon >> family when we don't _really_ need to, I would want to propose that >> the value used is good enough uniformly. > > I would be fine with setting it to 0 if Fam < 10h if you think that's > acceptable. I think the chances of someone running Xen >= 4.15 on such > old hardware are quite dim. Would you mind explaining how returning 0 in this case would be better? No hard-coded value will ever be guaranteed to reflect the truth. See my reply to Andrew - if anything we'd need to let the hardware field shine through, and in _that_ case I of course I agree that we then should treat Fam0F specially. I will admit though that as per the BKDG I'm looking at only even values are defined for the field. Reporting 1 here therefore may do good (keep OSes from trying to use any of this P-state stuff) or bad (confuse OSes). Jan
--- a/xen/arch/x86/msr.c +++ b/xen/arch/x86/msr.c @@ -315,6 +315,12 @@ int guest_rdmsr(struct vcpu *v, uint32_t *val = msrs->tsc_aux; break; + case MSR_K8_HWCR: + if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) + goto gp_fault; + *val = K8_HWCR_TSC_FREQ_SEL; + break; + case MSR_AMD64_DE_CFG: if ( !(cp->x86_vendor & (X86_VENDOR_AMD | X86_VENDOR_HYGON)) ) goto gp_fault; --- a/xen/include/asm-x86/msr-index.h +++ b/xen/include/asm-x86/msr-index.h @@ -287,6 +287,8 @@ #define MSR_K7_HWCR 0xc0010015 #define MSR_K8_HWCR 0xc0010015 +#define K8_HWCR_TSC_FREQ_SEL (1ULL << 24) + #define MSR_K7_FID_VID_CTL 0xc0010041 #define MSR_K7_FID_VID_STATUS 0xc0010042 #define MSR_K8_PSTATE_LIMIT 0xc0010061
Linux has been warning ("firmware bug") about this bit being clear for a long time. While writable in older hardware it has been readonly on more than just most recent hardware. For simplicitly report it always set (if anything we may want to log the issue ourselves if it turns out to be clear on older hardware). Signed-off-by: Jan Beulich <jbeulich@suse.com> --- v2: New. --- There are likely more bits worthwhile to expose, but for about every one of them there would be the risk of a lengthy discussion, as there are clear downsides to exposing such information, the more that it would be tbd whether the hardware values should be surfaced, and if so what should happen when the guest gets migrated. The main risk with making the read not fault here is that guests might imply they can also write this MSR then.