Message ID | edecdda9-4728-4c65-9f31-76c912a433d7@suse.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | x86/CPU: don't hard-code MTRR availability | expand |
On Tue, Mar 25, 2025 at 08:18:11AM +0100, Jan Beulich wrote: > In particular if we're running virtualized, the underlying hypervisor > (which may be another Xen) may not surface MTRRs, and offer PAT only. At least for Xen, I think we offer MTRR uniformly, even on PVH guests? > Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 64-bit") > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> My main concern is whether the !mtrr path is still functional. Have you tried booting the resulting hypervisor with MTRR masked on CPUID? (or alternatively short-circuiting cpu_has_mtrr == 0?) Thanks, Roger.
On 27.03.2025 09:21, Roger Pau Monné wrote: > On Tue, Mar 25, 2025 at 08:18:11AM +0100, Jan Beulich wrote: >> In particular if we're running virtualized, the underlying hypervisor >> (which may be another Xen) may not surface MTRRs, and offer PAT only. > > At least for Xen, I think we offer MTRR uniformly, even on PVH > guests? By default we do, but we discussed the option of offering PAT-only environments beyond leaving it open whether people disabling MTRR support in their guest config are outside of supported terrain. >> Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 64-bit") >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > Acked-by: Roger Pau Monné <roger.pau@citrix.com> > > My main concern is whether the !mtrr path is still functional. Have > you tried booting the resulting hypervisor with MTRR masked on CPUID? > > (or alternatively short-circuiting cpu_has_mtrr == 0?) I didn't think this would be a prereq here. If we boot in an environment truly lacking MTRR, we'll crash on the first MTRR MSR access - none of those accesses use the safe accessors. Since you asked, I tried booting with "cpuid=no-mtrr". As I'm doing this on a system with console, all I can say is that it takes minutes to reach the point where we'd start setting up Dom0 (which itself then takes so long that I timed out waiting for it to make progress), due to all screen output becoming unbelievably slow after AP bringup. Surely something's screwed somewhere, as VRAM accesses being slow (or fast) shouldn't depend on AP bringup having completed. I actually suspect it's not just VRAM accesses which are slow, but that we're left running in uncachable mode altogether for whatever reason. What this maybe useful for is to figure out the reason of "Platform timer appears to have unexpectedly wrapped <N> times", which I saw appear once. Given this, I'm actually uncertain whether it is legitimate then to take your ack. Jan
On 27.03.2025 10:15, Jan Beulich wrote: > On 27.03.2025 09:21, Roger Pau Monné wrote: >> My main concern is whether the !mtrr path is still functional. Have >> you tried booting the resulting hypervisor with MTRR masked on CPUID? >> >> (or alternatively short-circuiting cpu_has_mtrr == 0?) > > I didn't think this would be a prereq here. If we boot in an environment truly > lacking MTRR, we'll crash on the first MTRR MSR access - none of those accesses > use the safe accessors. Since you asked, I tried booting with "cpuid=no-mtrr". > As I'm doing this on a system with console, all I can say is that it takes > minutes to reach the point where we'd start setting up Dom0 (which itself then > takes so long that I timed out waiting for it to make progress), due to all > screen output becoming unbelievably slow after AP bringup. Surely something's > screwed somewhere, as VRAM accesses being slow (or fast) shouldn't depend on AP > bringup having completed. I actually suspect it's not just VRAM accesses which > are slow, but that we're left running in uncachable mode altogether for whatever > reason. I think found a trivial way to avoid this, and the change there imo makes sense even if I can't explain this particular aspect of it. I'll submit that independently. Jan
On Thu, Mar 27, 2025 at 10:15:03AM +0100, Jan Beulich wrote: > On 27.03.2025 09:21, Roger Pau Monné wrote: > > On Tue, Mar 25, 2025 at 08:18:11AM +0100, Jan Beulich wrote: > >> In particular if we're running virtualized, the underlying hypervisor > >> (which may be another Xen) may not surface MTRRs, and offer PAT only. > > > > At least for Xen, I think we offer MTRR uniformly, even on PVH > > guests? > > By default we do, but we discussed the option of offering PAT-only environments > beyond leaving it open whether people disabling MTRR support in their guest > config are outside of supported terrain. > > >> Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 64-bit") > >> Signed-off-by: Jan Beulich <jbeulich@suse.com> > > > > Acked-by: Roger Pau Monné <roger.pau@citrix.com> > > > > My main concern is whether the !mtrr path is still functional. Have > > you tried booting the resulting hypervisor with MTRR masked on CPUID? > > > > (or alternatively short-circuiting cpu_has_mtrr == 0?) > > I didn't think this would be a prereq here. If we boot in an environment truly > lacking MTRR, we'll crash on the first MTRR MSR access - none of those accesses > use the safe accessors. Right, hopefully we don't have unprotected MTRR MSR accesses, so cpu_has_mtrr == 0 should prevent those. > Since you asked, I tried booting with "cpuid=no-mtrr". > As I'm doing this on a system with console, all I can say is that it takes > minutes to reach the point where we'd start setting up Dom0 (which itself then > takes so long that I timed out waiting for it to make progress), due to all > screen output becoming unbelievably slow after AP bringup. Surely something's > screwed somewhere, as VRAM accesses being slow (or fast) shouldn't depend on AP > bringup having completed. I actually suspect it's not just VRAM accesses which > are slow, but that we're left running in uncachable mode altogether for whatever > reason. > > What this maybe useful for is to figure out the reason of "Platform timer > appears to have unexpectedly wrapped <N> times", which I saw appear once. > > Given this, I'm actually uncertain whether it is legitimate then to take your > ack. I think it might be best if we can figure out what causes those issues (and possibly fix them) before taking this patch? Albeit you could argue that running excruciatingly slow is better than just crashing of an unhandled #GP from a rdmsr. Thanks, Roger.
On 27.03.2025 10:50, Roger Pau Monné wrote: > On Thu, Mar 27, 2025 at 10:15:03AM +0100, Jan Beulich wrote: >> On 27.03.2025 09:21, Roger Pau Monné wrote: >>> On Tue, Mar 25, 2025 at 08:18:11AM +0100, Jan Beulich wrote: >>>> In particular if we're running virtualized, the underlying hypervisor >>>> (which may be another Xen) may not surface MTRRs, and offer PAT only. >>> >>> At least for Xen, I think we offer MTRR uniformly, even on PVH >>> guests? >> >> By default we do, but we discussed the option of offering PAT-only environments >> beyond leaving it open whether people disabling MTRR support in their guest >> config are outside of supported terrain. >> >>>> Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 64-bit") >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> >>> >>> Acked-by: Roger Pau Monné <roger.pau@citrix.com> >>> >>> My main concern is whether the !mtrr path is still functional. Have >>> you tried booting the resulting hypervisor with MTRR masked on CPUID? >>> >>> (or alternatively short-circuiting cpu_has_mtrr == 0?) >> >> I didn't think this would be a prereq here. If we boot in an environment truly >> lacking MTRR, we'll crash on the first MTRR MSR access - none of those accesses >> use the safe accessors. > > Right, hopefully we don't have unprotected MTRR MSR accesses, so > cpu_has_mtrr == 0 should prevent those. Actually we do, see my other patch just posted. >> Since you asked, I tried booting with "cpuid=no-mtrr". >> As I'm doing this on a system with console, all I can say is that it takes >> minutes to reach the point where we'd start setting up Dom0 (which itself then >> takes so long that I timed out waiting for it to make progress), due to all >> screen output becoming unbelievably slow after AP bringup. Surely something's >> screwed somewhere, as VRAM accesses being slow (or fast) shouldn't depend on AP >> bringup having completed. I actually suspect it's not just VRAM accesses which >> are slow, but that we're left running in uncachable mode altogether for whatever >> reason. >> >> What this maybe useful for is to figure out the reason of "Platform timer >> appears to have unexpectedly wrapped <N> times", which I saw appear once. >> >> Given this, I'm actually uncertain whether it is legitimate then to take your >> ack. > > I think it might be best if we can figure out what causes those issues > (and possibly fix them) before taking this patch? > > Albeit you could argue that running excruciatingly slow is better than > just crashing of an unhandled #GP from a rdmsr. Indeed that's my thinking. But if you prefer, I can wait with this patch until after the other one has gone in. Jan
On Thu, Mar 27, 2025 at 10:59:58AM +0100, Jan Beulich wrote: > On 27.03.2025 10:50, Roger Pau Monné wrote: > > On Thu, Mar 27, 2025 at 10:15:03AM +0100, Jan Beulich wrote: > >> On 27.03.2025 09:21, Roger Pau Monné wrote: > >>> On Tue, Mar 25, 2025 at 08:18:11AM +0100, Jan Beulich wrote: > >>>> In particular if we're running virtualized, the underlying hypervisor > >>>> (which may be another Xen) may not surface MTRRs, and offer PAT only. > >>> > >>> At least for Xen, I think we offer MTRR uniformly, even on PVH > >>> guests? > >> > >> By default we do, but we discussed the option of offering PAT-only environments > >> beyond leaving it open whether people disabling MTRR support in their guest > >> config are outside of supported terrain. > >> > >>>> Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 64-bit") > >>>> Signed-off-by: Jan Beulich <jbeulich@suse.com> > >>> > >>> Acked-by: Roger Pau Monné <roger.pau@citrix.com> > >>> > >>> My main concern is whether the !mtrr path is still functional. Have > >>> you tried booting the resulting hypervisor with MTRR masked on CPUID? > >>> > >>> (or alternatively short-circuiting cpu_has_mtrr == 0?) > >> > >> I didn't think this would be a prereq here. If we boot in an environment truly > >> lacking MTRR, we'll crash on the first MTRR MSR access - none of those accesses > >> use the safe accessors. > > > > Right, hopefully we don't have unprotected MTRR MSR accesses, so > > cpu_has_mtrr == 0 should prevent those. > > Actually we do, see my other patch just posted. Oh, yes, that was one of my concerns, but I've failed to spot it. > >> Since you asked, I tried booting with "cpuid=no-mtrr". > >> As I'm doing this on a system with console, all I can say is that it takes > >> minutes to reach the point where we'd start setting up Dom0 (which itself then > >> takes so long that I timed out waiting for it to make progress), due to all > >> screen output becoming unbelievably slow after AP bringup. Surely something's > >> screwed somewhere, as VRAM accesses being slow (or fast) shouldn't depend on AP > >> bringup having completed. I actually suspect it's not just VRAM accesses which > >> are slow, but that we're left running in uncachable mode altogether for whatever > >> reason. > >> > >> What this maybe useful for is to figure out the reason of "Platform timer > >> appears to have unexpectedly wrapped <N> times", which I saw appear once. > >> > >> Given this, I'm actually uncertain whether it is legitimate then to take your > >> ack. > > > > I think it might be best if we can figure out what causes those issues > > (and possibly fix them) before taking this patch? > > > > Albeit you could argue that running excruciatingly slow is better than > > just crashing of an unhandled #GP from a rdmsr. > > Indeed that's my thinking. But if you prefer, I can wait with this patch until > after the other one has gone in. Given the small context of the other patch, you might as well put both in together. Thanks, Roger.
On Tue, Mar 25, 2025 at 08:18:11AM +0100, Jan Beulich wrote: > In particular if we're running virtualized, the underlying hypervisor > (which may be another Xen) may not surface MTRRs, and offer PAT only. > > Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 64-bit") > Signed-off-by: Jan Beulich <jbeulich@suse.com> Acked-by: Roger Pau Monné <roger.pau@citrix.com> Thanks, Roger.
--- a/xen/arch/x86/include/asm/cpufeature.h +++ b/xen/arch/x86/include/asm/cpufeature.h @@ -70,7 +70,7 @@ static inline bool boot_cpu_has(unsigned #define cpu_has_pse 1 #define cpu_has_apic boot_cpu_has(X86_FEATURE_APIC) #define cpu_has_sep boot_cpu_has(X86_FEATURE_SEP) -#define cpu_has_mtrr 1 +#define cpu_has_mtrr boot_cpu_has(X86_FEATURE_MTRR) #define cpu_has_pge 1 #define cpu_has_pse36 boot_cpu_has(X86_FEATURE_PSE36) #define cpu_has_clflush boot_cpu_has(X86_FEATURE_CLFLUSH)
In particular if we're running virtualized, the underlying hypervisor (which may be another Xen) may not surface MTRRs, and offer PAT only. Fixes: 5a281883cdc3 ("Hardcode many cpu features for x86/64 -- we know 64-bit") Signed-off-by: Jan Beulich <jbeulich@suse.com>