mbox series

[v5,00/16] x86: make PAT and MTRR independent from each other

Message ID 20221102074713.21493-1-jgross@suse.com (mailing list archive)
Headers show
Series x86: make PAT and MTRR independent from each other | expand

Message

Juergen Gross Nov. 2, 2022, 7:46 a.m. UTC
Today PAT can't be used without MTRR being available, unless MTRR is at
least configured via CONFIG_MTRR and the system is running as Xen PV
guest. In this case PAT is automatically available via the hypervisor,
but the PAT MSR can't be modified by the kernel and MTRR is disabled.

The same applies to a kernel built with no MTRR support: it won't
allow to use the PAT MSR, even if there is no technical reason for
that, other than setting up PAT on all CPUs the same way (which is a
requirement of the processor's cache management) is relying on some
MTRR specific code.

Fix all of that by:

- moving the function needed by PAT from MTRR specific code one level
  up
- reworking the init sequences of MTRR and PAT to be more similar to
  each other without calling PAT from MTRR code
- removing the dependency of PAT on MTRR

There is some more cleanup done reducing code size.

Note that patches 1+2 have already been applied to tip.git x86/cpu.
They are included in this series only for reference.

Changes in V5:
- addressed comments

Changes in V4:
- new patches 10, 14, 15, 16
- split up old patch 4 into 3 patches
- addressed comments

Changes in V3:
- replace patch 1 by just adding a comment

Changes in V2:
- complete rework of the patches based on comments by Boris Petkov
- added several patches to the series

Juergen Gross (16):
  x86/mtrr: add comment for set_mtrr_state() serialization
  x86/mtrr: remove unused cyrix_set_all() function
  x86/mtrr: replace use_intel() with a local flag
  x86/mtrr: rename prepare_set() and post_set()
  x86/mtrr: split MTRR specific handling from cache dis/enabling
  x86: move some code out of arch/x86/kernel/cpu/mtrr
  x86/mtrr: Disentangle MTRR init from PAT init.
  x86/mtrr: remove set_all callback from struct mtrr_ops
  x86/mtrr: simplify mtrr_bp_init()
  x86/mtrr: get rid of __mtrr_enabled bool
  x86/mtrr: let cache_aps_delayed_init replace mtrr_aps_delayed_init
  x86/mtrr: add a stop_machine() handler calling only cache_cpu_init()
  x86: decouple PAT and MTRR handling
  x86: switch cache_ap_init() to hotplug callback
  x86: do MTRR/PAT setup on all secondary CPUs in parallel
  x86/mtrr: simplify mtrr_ops initialization

 arch/x86/include/asm/cacheinfo.h   |  19 ++++
 arch/x86/include/asm/memtype.h     |   5 +-
 arch/x86/include/asm/mtrr.h        |  17 +--
 arch/x86/kernel/cpu/cacheinfo.c    | 173 +++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/common.c       |   2 +-
 arch/x86/kernel/cpu/mtrr/amd.c     |   8 +-
 arch/x86/kernel/cpu/mtrr/centaur.c |   8 +-
 arch/x86/kernel/cpu/mtrr/cyrix.c   |  42 +------
 arch/x86/kernel/cpu/mtrr/generic.c | 127 ++++-----------------
 arch/x86/kernel/cpu/mtrr/mtrr.c    | 171 ++++------------------------
 arch/x86/kernel/cpu/mtrr/mtrr.h    |  15 +--
 arch/x86/kernel/setup.c            |  14 +--
 arch/x86/kernel/smpboot.c          |   9 +-
 arch/x86/mm/pat/memtype.c          | 152 +++++++++----------------
 arch/x86/power/cpu.c               |   3 +-
 include/linux/cpuhotplug.h         |   1 +
 16 files changed, 308 insertions(+), 458 deletions(-)

Comments

Borislav Petkov Nov. 2, 2022, 6:04 p.m. UTC | #1
On Wed, Nov 02, 2022 at 08:46:57AM +0100, Juergen Gross wrote:
> Today PAT can't be used without MTRR being available, unless MTRR is at
> least configured via CONFIG_MTRR and the system is running as Xen PV
> guest. In this case PAT is automatically available via the hypervisor,
> but the PAT MSR can't be modified by the kernel and MTRR is disabled.
> 
> The same applies to a kernel built with no MTRR support: it won't
> allow to use the PAT MSR, even if there is no technical reason for
> that, other than setting up PAT on all CPUs the same way (which is a
> requirement of the processor's cache management) is relying on some
> MTRR specific code.
> 
> Fix all of that by:

One of the AMD test boxes here says with this:

...
[    0.863466] PCI: not using MMCONFIG
[    0.863475] PCI: Using configuration type 1 for base access
[    0.863478] PCI: Using configuration type 1 for extended access
[    0.866733] mtrr: your CPUs had inconsistent MTRRdefType settings
[    0.866737] mtrr: probably your BIOS does not setup all CPUs.
[    0.866740] mtrr: corrected configuration.
[    0.869350] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
...

Previous logs don't have it:

PCI: not using MMCONFIG
PCI: Using configuration type 1 for base access
PCI: Using configuration type 1 for extended access
kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
Juergen Gross Nov. 3, 2022, 8:40 a.m. UTC | #2
On 02.11.22 19:04, Borislav Petkov wrote:
> On Wed, Nov 02, 2022 at 08:46:57AM +0100, Juergen Gross wrote:
>> Today PAT can't be used without MTRR being available, unless MTRR is at
>> least configured via CONFIG_MTRR and the system is running as Xen PV
>> guest. In this case PAT is automatically available via the hypervisor,
>> but the PAT MSR can't be modified by the kernel and MTRR is disabled.
>>
>> The same applies to a kernel built with no MTRR support: it won't
>> allow to use the PAT MSR, even if there is no technical reason for
>> that, other than setting up PAT on all CPUs the same way (which is a
>> requirement of the processor's cache management) is relying on some
>> MTRR specific code.
>>
>> Fix all of that by:
> 
> One of the AMD test boxes here says with this:
> 
> ...
> [    0.863466] PCI: not using MMCONFIG
> [    0.863475] PCI: Using configuration type 1 for base access
> [    0.863478] PCI: Using configuration type 1 for extended access
> [    0.866733] mtrr: your CPUs had inconsistent MTRRdefType settings
> [    0.866737] mtrr: probably your BIOS does not setup all CPUs.
> [    0.866740] mtrr: corrected configuration.
> [    0.869350] kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
> ...
> 
> Previous logs don't have it:
> 
> PCI: not using MMCONFIG
> PCI: Using configuration type 1 for base access
> PCI: Using configuration type 1 for extended access
> kprobes: kprobe jump-optimization is enabled. All kprobes are optimized if possible.
> 

Weird. I can't spot any modification which could have caused that.

Would it be possible to identify the patch causing that?


Juergen
Borislav Petkov Nov. 3, 2022, 4:15 p.m. UTC | #3
On Thu, Nov 03, 2022 at 09:40:32AM +0100, Juergen Gross wrote:
> Would it be possible to identify the patch causing that?

Lemme try to find a smaller box which shows that too - that one is a
pain to bisect on.
Borislav Petkov Nov. 7, 2022, 7:25 p.m. UTC | #4
On Thu, Nov 03, 2022 at 05:15:52PM +0100, Borislav Petkov wrote:
> Lemme try to find a smaller box which shows that too - that one is a
> pain to bisect on.

Ok, couldn't find a smaller one (or maybe it had to be a big one to
tickle this out).

So I think it is the parallel setup thing:

x86/mtrr: Do MTRR/PAT setup on all secondary CPUs in parallel

Note that before it, it would do the configuration sequentially on each
CPU:

[    0.759239] MTRR: prepare_set: CPU83, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[    0.759239] MTRR: set_mtrr_state: CPU83, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
[    0.760794] MTRR: post_set: CPU83, MSR_MTRRdefType will write: (0xc00:0)
[    0.761151] MTRR: prepare_set: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[    0.761151] MTRR: set_mtrr_state: CPU70, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
[    0.761151] MTRR: post_set: CPU70, MSR_MTRRdefType will write: (0xc00:0)
...

and so on.

Now, it would do it all in parallel:

[    0.762006] MTRR: mtrr_disable: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[    0.761916] MTRR: mtrr_disable: CPU18, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[    0.761808] MTRR: mtrr_disable: CPU82, MSR_MTRRdefType: 0x0, read: (0xc00:0)
[    0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
								      ^^^^^^

Note that last thing. That comes from (with debug output added):

void mtrr_disable(struct cache_state *state)
{
        unsigned int cpu = smp_processor_id();
        u64 msrval;

        /* Save MTRR state */
        rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);

        /* Disable MTRRs, and set the default type to uncached */
        mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
                   state->mtrr_deftype_hi);

        rdmsrl(MSR_MTRRdefType, msrval);

        pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
                __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
}

The "read: (0x0:0)" basically says that

	state->mtrr_deftype_lo, state->mtrr_deftype_hi

are both 0 already. BUT(!), they should NOT be. The low piece is 0xc00 on most
cores except a handful and it means that MTRRs and Fixed Range are
enabled. In total, they're these cores here:

[    0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762247] MTRR: mtrr_disable: CPU26, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762685] MTRR: mtrr_disable: CPU68, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762725] MTRR: mtrr_disable: CPU17, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762685] MTRR: mtrr_disable: CPU69, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762800] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762734] MTRR: mtrr_disable: CPU13, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762720] MTRR: mtrr_disable: CPU24, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762696] MTRR: mtrr_disable: CPU66, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762716] MTRR: mtrr_disable: CPU48, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762693] MTRR: mtrr_disable: CPU57, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762519] MTRR: mtrr_disable: CPU87, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762532] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762755] MTRR: mtrr_disable: CPU32, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762693] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762861] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762724] MTRR: mtrr_disable: CPU21, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762640] MTRR: mtrr_disable: CPU15, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762615] MTRR: mtrr_disable: CPU50, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762741] MTRR: mtrr_disable: CPU40, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762738] MTRR: mtrr_disable: CPU37, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762716] MTRR: mtrr_disable: CPU25, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762512] MTRR: mtrr_disable: CPU59, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762721] MTRR: mtrr_disable: CPU45, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762682] MTRR: mtrr_disable: CPU56, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762583] MTRR: mtrr_disable: CPU124, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762751] MTRR: mtrr_disable: CPU12, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762741] MTRR: mtrr_disable: CPU9, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762575] MTRR: mtrr_disable: CPU51, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762632] MTRR: mtrr_disable: CPU100, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762688] MTRR: mtrr_disable: CPU61, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762610] MTRR: mtrr_disable: CPU105, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762721] MTRR: mtrr_disable: CPU20, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.762583] MTRR: mtrr_disable: CPU47, MSR_MTRRdefType: 0x0, read: (0x0:0)

Now, if I add MFENCEs around those RDMSRs:

void mtrr_disable(struct cache_state *state)
{
        unsigned int cpu = smp_processor_id();
        u64 msrval;

        /* Save MTRR state */
        rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);

        __mb();

        /* Disable MTRRs, and set the default type to uncached */
        mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
                   state->mtrr_deftype_hi);

        __mb();

        rdmsrl(MSR_MTRRdefType, msrval);

        pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
                __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);

        __mb();
}

the amount of cores becomes less:

[    0.765260] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765462] MTRR: mtrr_disable: CPU5, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765242] MTRR: mtrr_disable: CPU22, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765522] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765474] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765207] MTRR: mtrr_disable: CPU54, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765225] MTRR: mtrr_disable: CPU8, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765282] MTRR: mtrr_disable: CPU88, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765150] MTRR: mtrr_disable: CPU119, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765370] MTRR: mtrr_disable: CPU49, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765395] MTRR: mtrr_disable: CPU16, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765348] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
[    0.765270] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)

which basically hints at some speculative fun where we end up reading
the MSR *after* the write to it has already happened. After this thing:

        /* Disable MTRRs, and set the default type to uncached */
        mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
                   state->mtrr_deftype_hi);

and thus when we read it, we already read the disabled state. But this
is only a conjecture because I still have no clear idea how TF would
that even happen?!?

Needless to say, this fixes it, ofc:

diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
index 3805a6d32d37..4a685898caf3 100644
--- a/arch/x86/kernel/cpu/cacheinfo.c
+++ b/arch/x86/kernel/cpu/cacheinfo.c
@@ -1116,12 +1116,14 @@ void cache_enable(struct cache_state *state)
                __write_cr4(state->cr4);
 }
 
+static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
+
 static void cache_cpu_init(void)
 {
        unsigned long flags;
        struct cache_state state = { };
 
-       local_irq_save(flags);
+       raw_spin_lock_irqsave(&set_atomicity_lock, flags);
        cache_disable(&state);
 
        if (memory_caching_control & CACHE_MTRR)
@@ -1131,7 +1133,7 @@ static void cache_cpu_init(void)
                pat_cpu_init();
 
        cache_enable(&state);
-       local_irq_restore(flags);
+       raw_spin_unlock_irqrestore(&set_atomicity_lock, flags);
 }
 
 static bool cache_aps_delayed_init = true;

---

and frankly, considering how we have bigger fish to fry, I'd say we do
it the old way and leave that can'o'worms half-opened.

Unless you wanna continue poking at it. I can give you access to that
box at work...

Thx.
Juergen Gross Nov. 8, 2022, 7:30 a.m. UTC | #5
On 07.11.22 20:25, Borislav Petkov wrote:
> On Thu, Nov 03, 2022 at 05:15:52PM +0100, Borislav Petkov wrote:
>> Lemme try to find a smaller box which shows that too - that one is a
>> pain to bisect on.
> 
> Ok, couldn't find a smaller one (or maybe it had to be a big one to
> tickle this out).
> 
> So I think it is the parallel setup thing:
> 
> x86/mtrr: Do MTRR/PAT setup on all secondary CPUs in parallel
> 
> Note that before it, it would do the configuration sequentially on each
> CPU:
> 
> [    0.759239] MTRR: prepare_set: CPU83, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.759239] MTRR: set_mtrr_state: CPU83, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
> [    0.760794] MTRR: post_set: CPU83, MSR_MTRRdefType will write: (0xc00:0)
> [    0.761151] MTRR: prepare_set: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.761151] MTRR: set_mtrr_state: CPU70, mtrr_deftype_lo: 0xc00, mtrr_state.def_type: 0, mtrr_state.enabled: 3
> [    0.761151] MTRR: post_set: CPU70, MSR_MTRRdefType will write: (0xc00:0)
> ...
> 
> and so on.
> 
> Now, it would do it all in parallel:
> 
> [    0.762006] MTRR: mtrr_disable: CPU70, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.761916] MTRR: mtrr_disable: CPU18, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.761808] MTRR: mtrr_disable: CPU82, MSR_MTRRdefType: 0x0, read: (0xc00:0)
> [    0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> 								      ^^^^^^
> 
> Note that last thing. That comes from (with debug output added):
> 
> void mtrr_disable(struct cache_state *state)
> {
>          unsigned int cpu = smp_processor_id();
>          u64 msrval;
> 
>          /* Save MTRR state */
>          rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> 
>          /* Disable MTRRs, and set the default type to uncached */
>          mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
>                     state->mtrr_deftype_hi);
> 
>          rdmsrl(MSR_MTRRdefType, msrval);
> 
>          pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
>                  __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> }
> 
> The "read: (0x0:0)" basically says that
> 
> 	state->mtrr_deftype_lo, state->mtrr_deftype_hi
> 
> are both 0 already. BUT(!), they should NOT be. The low piece is 0xc00 on most
> cores except a handful and it means that MTRRs and Fixed Range are
> enabled. In total, they're these cores here:
> 
> [    0.762593] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762247] MTRR: mtrr_disable: CPU26, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762685] MTRR: mtrr_disable: CPU68, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762725] MTRR: mtrr_disable: CPU17, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762685] MTRR: mtrr_disable: CPU69, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762800] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762734] MTRR: mtrr_disable: CPU13, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762720] MTRR: mtrr_disable: CPU24, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762696] MTRR: mtrr_disable: CPU66, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762716] MTRR: mtrr_disable: CPU48, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762693] MTRR: mtrr_disable: CPU57, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762519] MTRR: mtrr_disable: CPU87, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762532] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762755] MTRR: mtrr_disable: CPU32, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762693] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762861] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762724] MTRR: mtrr_disable: CPU21, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762640] MTRR: mtrr_disable: CPU15, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762615] MTRR: mtrr_disable: CPU50, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762741] MTRR: mtrr_disable: CPU40, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762738] MTRR: mtrr_disable: CPU37, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762716] MTRR: mtrr_disable: CPU25, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762512] MTRR: mtrr_disable: CPU59, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762721] MTRR: mtrr_disable: CPU45, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762682] MTRR: mtrr_disable: CPU56, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762583] MTRR: mtrr_disable: CPU124, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762751] MTRR: mtrr_disable: CPU12, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762741] MTRR: mtrr_disable: CPU9, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762575] MTRR: mtrr_disable: CPU51, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762632] MTRR: mtrr_disable: CPU100, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762688] MTRR: mtrr_disable: CPU61, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762610] MTRR: mtrr_disable: CPU105, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762721] MTRR: mtrr_disable: CPU20, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.762583] MTRR: mtrr_disable: CPU47, MSR_MTRRdefType: 0x0, read: (0x0:0)
> 
> Now, if I add MFENCEs around those RDMSRs:
> 
> void mtrr_disable(struct cache_state *state)
> {
>          unsigned int cpu = smp_processor_id();
>          u64 msrval;
> 
>          /* Save MTRR state */
>          rdmsr(MSR_MTRRdefType, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> 
>          __mb();
> 
>          /* Disable MTRRs, and set the default type to uncached */
>          mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
>                     state->mtrr_deftype_hi);
> 
>          __mb();
> 
>          rdmsrl(MSR_MTRRdefType, msrval);
> 
>          pr_info("%s: CPU%d, MSR_MTRRdefType: 0x%llx, read: (0x%x:%x)\n",
>                  __func__, cpu, msrval, state->mtrr_deftype_lo, state->mtrr_deftype_hi);
> 
>          __mb();
> }
> 
> the amount of cores becomes less:

Probably not because of the fencing, but because of the different timing.

> 
> [    0.765260] MTRR: mtrr_disable: CPU6, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765462] MTRR: mtrr_disable: CPU5, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765242] MTRR: mtrr_disable: CPU22, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765522] MTRR: mtrr_disable: CPU0, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765474] MTRR: mtrr_disable: CPU1, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765207] MTRR: mtrr_disable: CPU54, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765225] MTRR: mtrr_disable: CPU8, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765282] MTRR: mtrr_disable: CPU88, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765150] MTRR: mtrr_disable: CPU119, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765370] MTRR: mtrr_disable: CPU49, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765395] MTRR: mtrr_disable: CPU16, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765348] MTRR: mtrr_disable: CPU52, MSR_MTRRdefType: 0x0, read: (0x0:0)
> [    0.765270] MTRR: mtrr_disable: CPU58, MSR_MTRRdefType: 0x0, read: (0x0:0)
> 
> which basically hints at some speculative fun where we end up reading
> the MSR *after* the write to it has already happened. After this thing:
> 
>          /* Disable MTRRs, and set the default type to uncached */
>          mtrr_wrmsr(MSR_MTRRdefType, state->mtrr_deftype_lo & ~0xcff,
>                     state->mtrr_deftype_hi);
> 
> and thus when we read it, we already read the disabled state. But this
> is only a conjecture because I still have no clear idea how TF would
> that even happen?!?

Yeah, and why doesn't it happen when we handle only one cpu at a time?

There might be some interaction between the cpus ...

> 
> Needless to say, this fixes it, ofc:
> 
> diff --git a/arch/x86/kernel/cpu/cacheinfo.c b/arch/x86/kernel/cpu/cacheinfo.c
> index 3805a6d32d37..4a685898caf3 100644
> --- a/arch/x86/kernel/cpu/cacheinfo.c
> +++ b/arch/x86/kernel/cpu/cacheinfo.c
> @@ -1116,12 +1116,14 @@ void cache_enable(struct cache_state *state)
>                  __write_cr4(state->cr4);
>   }
>   
> +static DEFINE_RAW_SPINLOCK(set_atomicity_lock);
> +
>   static void cache_cpu_init(void)
>   {
>          unsigned long flags;
>          struct cache_state state = { };
>   
> -       local_irq_save(flags);
> +       raw_spin_lock_irqsave(&set_atomicity_lock, flags);
>          cache_disable(&state);
>   
>          if (memory_caching_control & CACHE_MTRR)
> @@ -1131,7 +1133,7 @@ static void cache_cpu_init(void)
>                  pat_cpu_init();
>   
>          cache_enable(&state);
> -       local_irq_restore(flags);
> +       raw_spin_unlock_irqrestore(&set_atomicity_lock, flags);
>   }
>   
>   static bool cache_aps_delayed_init = true;
> 
> ---
> 
> and frankly, considering how we have bigger fish to fry, I'd say we do
> it the old way and leave that can'o'worms half-opened.

I agree to keep this patch out of the series for now.

> 
> Unless you wanna continue poking at it. I can give you access to that
> box at work...

Yes, please. I suspect there are some additional requirements for updating
MTRR in parallel, or this is "just" a cpu bug.


Juergen