diff mbox series

[v2] Subject: x86/PAT: Report PAT on CPUs that support PAT without MTRR

Message ID 9d5070ae4f3e956a95d3f50e24f1a93488b9ff52.1657671676.git.brchuckz@aol.com (mailing list archive)
State New, archived
Headers show
Series [v2] Subject: x86/PAT: Report PAT on CPUs that support PAT without MTRR | expand

Commit Message

Chuck Zmudzinski July 13, 2022, 1:36 a.m. UTC
The commit 99c13b8c8896d7bcb92753bf
("x86/mm/pat: Don't report PAT on CPUs that don't support it")
incorrectly failed to account for the case in init_cache_modes() when
CPUs do support PAT and falsely reported PAT to be disabled when in
fact PAT is enabled. In some environments, notably in Xen PV domains,
MTRR is disabled but PAT is still enabled, and that is the case
that the aforementioned commit failed to account for.

As an unfortunate consequnce, the pat_enabled() function currently does
not correctly report that PAT is enabled in such environments. The fix
is implemented in init_cache_modes() by setting pat_bp_enabled to true
in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
to account for.

This approach arranges for pat_enabled() to return true in the Xen PV
environment without undermining the rest of PAT MSR management logic
that considers PAT to be disabled: Specifically, no writes to the PAT
MSR should occur.

This patch fixes a regression that some users are experiencing with
Linux as a Xen Dom0 driving particular Intel graphics devices by
correctly reporting to the Intel i915 driver that PAT is enabled where
previously it was falsely reporting that PAT is disabled. Some users
are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
Dom0 are experiencing reduced graphics performance because the keying of
the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
means that in particular graphics frame buffer accesses are quite a bit
less performant than possible without this patch.

Also, with the current code, in the Xen PV environment, PAT will not be
disabled if the administrator sets the "nopat" boot option. Introduce
a new boolean variable, pat_force_disable, to forcibly disable PAT
when the administrator sets the "nopat" option to override the default
behavior of using the PAT configuration that Xen has provided.

For the new boolean to live in .init.data, init_cache_modes() also needs
moving to .init.text (where it could/should have lived already before).

Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
Co-developed-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
---
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
    *Add the necessary code to incorporate the "nopat" fix
    *void init_cache_modes(void) -> void __init init_cache_modes(void)
    *Add Jan Beulich as Co-developer (Jan has not signed off yet)
    *Expand the commit message to include relevant parts of the commit
     message of Jan Beulich's proposed patch for this problem
    *Fix 'else if ... {' placement and indentation
    *Remove indication the backport to stable branches is only back to 5.17.y

I think these changes address all the comments on the original patch

I added Jan Beulich as a Co-developer because Juergen Gross asked me to
include Jan's idea for fixing "nopat" that was missing from the first
version of the patch.

The patch has been tested, it works as expected with and without nopat
in the Xen PV Dom0 environment. That is, "nopat" causes the system to
exhibit the effects and problems that lack of PAT support causes.

 arch/x86/mm/pat/memtype.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

Jürgen Groß July 13, 2022, 4:12 a.m. UTC | #1
On 13.07.22 03:36, Chuck Zmudzinski wrote:
> The commit 99c13b8c8896d7bcb92753bf
> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> incorrectly failed to account for the case in init_cache_modes() when
> CPUs do support PAT and falsely reported PAT to be disabled when in
> fact PAT is enabled. In some environments, notably in Xen PV domains,
> MTRR is disabled but PAT is still enabled, and that is the case
> that the aforementioned commit failed to account for.
> 
> As an unfortunate consequnce, the pat_enabled() function currently does
> not correctly report that PAT is enabled in such environments. The fix
> is implemented in init_cache_modes() by setting pat_bp_enabled to true
> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> to account for.
> 
> This approach arranges for pat_enabled() to return true in the Xen PV
> environment without undermining the rest of PAT MSR management logic
> that considers PAT to be disabled: Specifically, no writes to the PAT
> MSR should occur.
> 
> This patch fixes a regression that some users are experiencing with
> Linux as a Xen Dom0 driving particular Intel graphics devices by
> correctly reporting to the Intel i915 driver that PAT is enabled where
> previously it was falsely reporting that PAT is disabled. Some users
> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> Dom0 are experiencing reduced graphics performance because the keying of
> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> means that in particular graphics frame buffer accesses are quite a bit
> less performant than possible without this patch.
> 
> Also, with the current code, in the Xen PV environment, PAT will not be
> disabled if the administrator sets the "nopat" boot option. Introduce
> a new boolean variable, pat_force_disable, to forcibly disable PAT
> when the administrator sets the "nopat" option to override the default
> behavior of using the PAT configuration that Xen has provided.
> 
> For the new boolean to live in .init.data, init_cache_modes() also needs
> moving to .init.text (where it could/should have lived already before).
> 
> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> Co-developed-by: Jan Beulich <jbeulich@suse.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
Jan Beulich July 13, 2022, 6:18 a.m. UTC | #2
On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
>     *Add the necessary code to incorporate the "nopat" fix
>     *void init_cache_modes(void) -> void __init init_cache_modes(void)
>     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
>     *Expand the commit message to include relevant parts of the commit
>      message of Jan Beulich's proposed patch for this problem
>     *Fix 'else if ... {' placement and indentation
>     *Remove indication the backport to stable branches is only back to 5.17.y
> 
> I think these changes address all the comments on the original patch
> 
> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> include Jan's idea for fixing "nopat" that was missing from the first
> version of the patch.

You've sufficiently altered this change to clearly no longer want my
S-o-b; unfortunately in fact I think you broke things:

> @@ -292,7 +294,7 @@ void init_cache_modes(void)
>  		rdmsrl(MSR_IA32_CR_PAT, pat);
>  	}
>  
> -	if (!pat) {
> +	if (!pat || pat_force_disabled) {

By checking the new variable here ...

>  		/*
>  		 * No PAT. Emulate the PAT table that corresponds to the two
>  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> @@ -313,6 +315,16 @@ void init_cache_modes(void)
>  		 */
>  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);

... you put in place a software view which doesn't match hardware. I
continue to think that ...

> +	} else if (!pat_bp_enabled) {

... the variable wants checking here instead (at which point, yes,
this comes quite close to simply being a v2 of my original patch).

By using !pat_bp_enabled here you actually broaden where the change
would take effect. Iirc Boris had asked to narrow things (besides
voicing opposition to this approach altogether). Even without that
request I wonder whether you aren't going to far with this.

Jan
Chuck Zmudzinski July 13, 2022, 8:51 a.m. UTC | #3
On 7/13/22 2:18 AM, Jan Beulich wrote:
> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> > v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> >     *Add the necessary code to incorporate the "nopat" fix
> >     *void init_cache_modes(void) -> void __init init_cache_modes(void)
> >     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> >     *Expand the commit message to include relevant parts of the commit
> >      message of Jan Beulich's proposed patch for this problem
> >     *Fix 'else if ... {' placement and indentation
> >     *Remove indication the backport to stable branches is only back to 5.17.y
> > 
> > I think these changes address all the comments on the original patch
> > 
> > I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> > include Jan's idea for fixing "nopat" that was missing from the first
> > version of the patch.
>
> You've sufficiently altered this change to clearly no longer want my
> S-o-b; unfortunately in fact I think you broke things:

Well, I hope we can come to an agreement so I have
your S-o-b. But that would probably require me to remove
Juergen's R-b.

> > @@ -292,7 +294,7 @@ void init_cache_modes(void)
> >  		rdmsrl(MSR_IA32_CR_PAT, pat);
> >  	}
> >  
> > -	if (!pat) {
> > +	if (!pat || pat_force_disabled) {
>
> By checking the new variable here ...
>
> >  		/*
> >  		 * No PAT. Emulate the PAT table that corresponds to the two
> >  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> > @@ -313,6 +315,16 @@ void init_cache_modes(void)
> >  		 */
> >  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>
> ... you put in place a software view which doesn't match hardware. I
> continue to think that ...
>
> > +	} else if (!pat_bp_enabled) {
>
> ... the variable wants checking here instead (at which point, yes,
> this comes quite close to simply being a v2 of my original patch).
>
> By using !pat_bp_enabled here you actually broaden where the change
> would take effect. Iirc Boris had asked to narrow things (besides
> voicing opposition to this approach altogether). Even without that
> request I wonder whether you aren't going to far with this.
>
> Jan

I thought about checking for the administrator's "nopat"
setting where you suggest which would limit the effect
of "nopat" to not reporting PAT as enabled to device
drivers who query for PAT availability using pat_enabled().
The main reason I did not do that is that due to the fact
that we cannot write to the PAT MSR, we cannot really
disable PAT. But we come closer to respecting the wishes
of the administrator by configuring the caching modes as
if PAT is actually disabled by the hardware or firmware
when in fact it is not.

What would you propose logging as a message when
we report PAT as disabled via pat_enabled()? The main
reason I did not choose to check the new variable in the
new 'else if' block is that I could not figure out what to
tell the administrator in that case. I think we would have
to log something like, "nopat is set, but we cannot disable
PAT, doing our best to disable PAT by not reporting PAT
as enabled via pat_enabled(), but that does not guarantee
that kernel drivers and components cannot use PAT if they
query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
instead of pat_enabled()." However, I acknowledge WC mappings
would still be disabled because arch_can_pci_mmap_wc() will
be false if pat_enabled() is false.

Perhaps we also need to log something if we keep the
check for "nopat" where I placed it. We could say something
like: "nopat is set, but we cannot disable hardware/firmware
PAT support, so we are emulating as if there is no PAT support
which puts in place a software view that does not match
hardware."

No matter what, because we cannot write to PAT MSR in
the Xen PV case, we probably need to log something to
explain the problems associated with trying to honor the
administrator's request. Also, what log level should it be.
Should it be a pr_warn instead of a pr_info?

I will agree to implement your approach of checking the new
variable where you suggest and limiting the effect of "nopat"
to not reporting PAT as enabled via pat_enabled() if there is a
consensus about what we should log to the administrator in that
case and if Juergen and/or Boris agrees with it.

Chuck
Jan Beulich July 13, 2022, 9:09 a.m. UTC | #4
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
> On 7/13/22 2:18 AM, Jan Beulich wrote:
>> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
>>>     *Add the necessary code to incorporate the "nopat" fix
>>>     *void init_cache_modes(void) -> void __init init_cache_modes(void)
>>>     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
>>>     *Expand the commit message to include relevant parts of the commit
>>>      message of Jan Beulich's proposed patch for this problem
>>>     *Fix 'else if ... {' placement and indentation
>>>     *Remove indication the backport to stable branches is only back to 5.17.y
>>>
>>> I think these changes address all the comments on the original patch
>>>
>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
>>> include Jan's idea for fixing "nopat" that was missing from the first
>>> version of the patch.
>>
>> You've sufficiently altered this change to clearly no longer want my
>> S-o-b; unfortunately in fact I think you broke things:
> 
> Well, I hope we can come to an agreement so I have
> your S-o-b. But that would probably require me to remove
> Juergen's R-b.
> 
>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
>>>  		rdmsrl(MSR_IA32_CR_PAT, pat);
>>>  	}
>>>  
>>> -	if (!pat) {
>>> +	if (!pat || pat_force_disabled) {
>>
>> By checking the new variable here ...
>>
>>>  		/*
>>>  		 * No PAT. Emulate the PAT table that corresponds to the two
>>>  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
>>>  		 */
>>>  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>>  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>
>> ... you put in place a software view which doesn't match hardware. I
>> continue to think that ...
>>
>>> +	} else if (!pat_bp_enabled) {
>>
>> ... the variable wants checking here instead (at which point, yes,
>> this comes quite close to simply being a v2 of my original patch).
>>
>> By using !pat_bp_enabled here you actually broaden where the change
>> would take effect. Iirc Boris had asked to narrow things (besides
>> voicing opposition to this approach altogether). Even without that
>> request I wonder whether you aren't going to far with this.
>>
>> Jan
> 
> I thought about checking for the administrator's "nopat"
> setting where you suggest which would limit the effect
> of "nopat" to not reporting PAT as enabled to device
> drivers who query for PAT availability using pat_enabled().
> The main reason I did not do that is that due to the fact
> that we cannot write to the PAT MSR, we cannot really
> disable PAT. But we come closer to respecting the wishes
> of the administrator by configuring the caching modes as
> if PAT is actually disabled by the hardware or firmware
> when in fact it is not.
> 
> What would you propose logging as a message when
> we report PAT as disabled via pat_enabled()? The main
> reason I did not choose to check the new variable in the
> new 'else if' block is that I could not figure out what to
> tell the administrator in that case. I think we would have
> to log something like, "nopat is set, but we cannot disable
> PAT, doing our best to disable PAT by not reporting PAT
> as enabled via pat_enabled(), but that does not guarantee
> that kernel drivers and components cannot use PAT if they
> query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
> instead of pat_enabled()." However, I acknowledge WC mappings
> would still be disabled because arch_can_pci_mmap_wc() will
> be false if pat_enabled() is false.
> 
> Perhaps we also need to log something if we keep the
> check for "nopat" where I placed it. We could say something
> like: "nopat is set, but we cannot disable hardware/firmware
> PAT support, so we are emulating as if there is no PAT support
> which puts in place a software view that does not match
> hardware."
> 
> No matter what, because we cannot write to PAT MSR in
> the Xen PV case, we probably need to log something to
> explain the problems associated with trying to honor the
> administrator's request. Also, what log level should it be.
> Should it be a pr_warn instead of a pr_info?

I'm afraid I'm the wrong one to answer logging questions. As you
can see from my original patch, I didn't add any new logging (and
no addition was requested in the comments that I have got). I also
don't think "nopat" has ever meant "disable PAT", as the feature
is either there or not. Instead I think it was always seen as
"disable fiddling with PAT", which by implication means using
whatever is there (if the feature / MSR itself is available).

Jan
Chuck Zmudzinski July 13, 2022, 10:36 a.m. UTC | #5
On 7/13/2022 5:09 AM, Jan Beulich wrote:
> On 13.07.2022 10:51, Chuck Zmudzinski wrote:
> > On 7/13/22 2:18 AM, Jan Beulich wrote:
> >> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> >>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> >>>     *Add the necessary code to incorporate the "nopat" fix
> >>>     *void init_cache_modes(void) -> void __init init_cache_modes(void)
> >>>     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> >>>     *Expand the commit message to include relevant parts of the commit
> >>>      message of Jan Beulich's proposed patch for this problem
> >>>     *Fix 'else if ... {' placement and indentation
> >>>     *Remove indication the backport to stable branches is only back to 5.17.y
> >>>
> >>> I think these changes address all the comments on the original patch
> >>>
> >>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> >>> include Jan's idea for fixing "nopat" that was missing from the first
> >>> version of the patch.
> >>
> >> You've sufficiently altered this change to clearly no longer want my
> >> S-o-b; unfortunately in fact I think you broke things:
> > 
> > Well, I hope we can come to an agreement so I have
> > your S-o-b. But that would probably require me to remove
> > Juergen's R-b.
> > 
> >>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
> >>>  		rdmsrl(MSR_IA32_CR_PAT, pat);
> >>>  	}
> >>>  
> >>> -	if (!pat) {
> >>> +	if (!pat || pat_force_disabled) {
> >>
> >> By checking the new variable here ...
> >>
> >>>  		/*
> >>>  		 * No PAT. Emulate the PAT table that corresponds to the two
> >>>  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
> >>>  		 */
> >>>  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >>>  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >>
> >> ... you put in place a software view which doesn't match hardware. I
> >> continue to think that ...
> >>
> >>> +	} else if (!pat_bp_enabled) {
> >>
> >> ... the variable wants checking here instead (at which point, yes,
> >> this comes quite close to simply being a v2 of my original patch).
> >>
> >> By using !pat_bp_enabled here you actually broaden where the change
> >> would take effect. Iirc Boris had asked to narrow things (besides
> >> voicing opposition to this approach altogether). Even without that
> >> request I wonder whether you aren't going to far with this.
> >>
> >> Jan
> > 
> > I thought about checking for the administrator's "nopat"
> > setting where you suggest which would limit the effect
> > of "nopat" to not reporting PAT as enabled to device
> > drivers who query for PAT availability using pat_enabled().
> > The main reason I did not do that is that due to the fact
> > that we cannot write to the PAT MSR, we cannot really
> > disable PAT. But we come closer to respecting the wishes
> > of the administrator by configuring the caching modes as
> > if PAT is actually disabled by the hardware or firmware
> > when in fact it is not.
> > 
> > What would you propose logging as a message when
> > we report PAT as disabled via pat_enabled()? The main
> > reason I did not choose to check the new variable in the
> > new 'else if' block is that I could not figure out what to
> > tell the administrator in that case. I think we would have
> > to log something like, "nopat is set, but we cannot disable
> > PAT, doing our best to disable PAT by not reporting PAT
> > as enabled via pat_enabled(), but that does not guarantee
> > that kernel drivers and components cannot use PAT if they
> > query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
> > instead of pat_enabled()." However, I acknowledge WC mappings
> > would still be disabled because arch_can_pci_mmap_wc() will
> > be false if pat_enabled() is false.
> > 
> > Perhaps we also need to log something if we keep the
> > check for "nopat" where I placed it. We could say something
> > like: "nopat is set, but we cannot disable hardware/firmware
> > PAT support, so we are emulating as if there is no PAT support
> > which puts in place a software view that does not match
> > hardware."
> > 
> > No matter what, because we cannot write to PAT MSR in
> > the Xen PV case, we probably need to log something to
> > explain the problems associated with trying to honor the
> > administrator's request. Also, what log level should it be.
> > Should it be a pr_warn instead of a pr_info?
>
> I'm afraid I'm the wrong one to answer logging questions. As you
> can see from my original patch, I didn't add any new logging (and
> no addition was requested in the comments that I have got). I also
> don't think "nopat" has ever meant "disable PAT", as the feature
> is either there or not. Instead I think it was always seen as
> "disable fiddling with PAT", which by implication means using
> whatever is there (if the feature / MSR itself is available).

IIRC, I do think I mentioned in the comments on your patch that
it would be preferable to mention in the commit message that
your patch would change the current behavior of "nopat" on
Xen. The question is, how much do we want to change the
current behavior of "nopat" on Xen. I think if we have to change
the current behavior of "nopat" on Xen and if we are going
to propagate that change to all current stable branches all
the way back to 4.9.y,, we better make a lot of noise about
what we are doing here.

Chuck
Chuck Zmudzinski July 13, 2022, 11:10 a.m. UTC | #6
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> On 7/13/2022 5:09 AM, Jan Beulich wrote:
> > On 13.07.2022 10:51, Chuck Zmudzinski wrote:
> > > On 7/13/22 2:18 AM, Jan Beulich wrote:
> > >> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> > >>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> > >>>     *Add the necessary code to incorporate the "nopat" fix
> > >>>     *void init_cache_modes(void) -> void __init init_cache_modes(void)
> > >>>     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> > >>>     *Expand the commit message to include relevant parts of the commit
> > >>>      message of Jan Beulich's proposed patch for this problem
> > >>>     *Fix 'else if ... {' placement and indentation
> > >>>     *Remove indication the backport to stable branches is only back to 5.17.y
> > >>>
> > >>> I think these changes address all the comments on the original patch
> > >>>
> > >>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> > >>> include Jan's idea for fixing "nopat" that was missing from the first
> > >>> version of the patch.
> > >>
> > >> You've sufficiently altered this change to clearly no longer want my
> > >> S-o-b; unfortunately in fact I think you broke things:
> > > 
> > > Well, I hope we can come to an agreement so I have
> > > your S-o-b. But that would probably require me to remove
> > > Juergen's R-b.
> > > 
> > >>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
> > >>>  		rdmsrl(MSR_IA32_CR_PAT, pat);
> > >>>  	}
> > >>>  
> > >>> -	if (!pat) {
> > >>> +	if (!pat || pat_force_disabled) {
> > >>
> > >> By checking the new variable here ...
> > >>
> > >>>  		/*
> > >>>  		 * No PAT. Emulate the PAT table that corresponds to the two
> > >>>  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> > >>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
> > >>>  		 */
> > >>>  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> > >>>  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> > >>
> > >> ... you put in place a software view which doesn't match hardware. I
> > >> continue to think that ...
> > >>
> > >>> +	} else if (!pat_bp_enabled) {
> > >>
> > >> ... the variable wants checking here instead (at which point, yes,
> > >> this comes quite close to simply being a v2 of my original patch).
> > >>
> > >> By using !pat_bp_enabled here you actually broaden where the change
> > >> would take effect. Iirc Boris had asked to narrow things (besides
> > >> voicing opposition to this approach altogether). Even without that
> > >> request I wonder whether you aren't going to far with this.
> > >>
> > >> Jan
> > > 
> > > I thought about checking for the administrator's "nopat"
> > > setting where you suggest which would limit the effect
> > > of "nopat" to not reporting PAT as enabled to device
> > > drivers who query for PAT availability using pat_enabled().
> > > The main reason I did not do that is that due to the fact
> > > that we cannot write to the PAT MSR, we cannot really
> > > disable PAT. But we come closer to respecting the wishes
> > > of the administrator by configuring the caching modes as
> > > if PAT is actually disabled by the hardware or firmware
> > > when in fact it is not.
> > > 
> > > What would you propose logging as a message when
> > > we report PAT as disabled via pat_enabled()? The main
> > > reason I did not choose to check the new variable in the
> > > new 'else if' block is that I could not figure out what to
> > > tell the administrator in that case. I think we would have
> > > to log something like, "nopat is set, but we cannot disable
> > > PAT, doing our best to disable PAT by not reporting PAT
> > > as enabled via pat_enabled(), but that does not guarantee
> > > that kernel drivers and components cannot use PAT if they
> > > query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
> > > instead of pat_enabled()." However, I acknowledge WC mappings
> > > would still be disabled because arch_can_pci_mmap_wc() will
> > > be false if pat_enabled() is false.
> > > 
> > > Perhaps we also need to log something if we keep the
> > > check for "nopat" where I placed it. We could say something
> > > like: "nopat is set, but we cannot disable hardware/firmware
> > > PAT support, so we are emulating as if there is no PAT support
> > > which puts in place a software view that does not match
> > > hardware."
> > > 
> > > No matter what, because we cannot write to PAT MSR in
> > > the Xen PV case, we probably need to log something to
> > > explain the problems associated with trying to honor the
> > > administrator's request. Also, what log level should it be.
> > > Should it be a pr_warn instead of a pr_info?
> >
> > I'm afraid I'm the wrong one to answer logging questions. As you
> > can see from my original patch, I didn't add any new logging (and
> > no addition was requested in the comments that I have got). I also
> > don't think "nopat" has ever meant "disable PAT", as the feature
> > is either there or not. Instead I think it was always seen as
> > "disable fiddling with PAT", which by implication means using
> > whatever is there (if the feature / MSR itself is available).
>
> IIRC, I do think I mentioned in the comments on your patch that
> it would be preferable to mention in the commit message that
> your patch would change the current behavior of "nopat" on
> Xen. The question is, how much do we want to change the
> current behavior of "nopat" on Xen. I think if we have to change
> the current behavior of "nopat" on Xen and if we are going
> to propagate that change to all current stable branches all
> the way back to 4.9.y,, we better make a lot of noise about
> what we are doing here.
>
> Chuck

And in addition, if we are going to backport this patch to
all current stable branches, we better have a really, really,
good reason for changing the behavior of "nopat" on Xen.

Does such a reason exist? Or perhaps, Juergen, could you
accept a v3 of my patch that does not need to decide
how to backport the change to "nopat" to the stable branches
that are affected by the current behavior of "nopat" on Xen?

To do such a v3, I would just have to fix the style problems
with my original patch and not come to an agreement with
Jan about how to deal with the "nopat" problem.

Chuck

Chuck
Jan Beulich July 13, 2022, 1:34 p.m. UTC | #7
On 13.07.2022 13:10, Chuck Zmudzinski wrote:
> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
>> On 7/13/2022 5:09 AM, Jan Beulich wrote:
>>> On 13.07.2022 10:51, Chuck Zmudzinski wrote:
>>>> On 7/13/22 2:18 AM, Jan Beulich wrote:
>>>>> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
>>>>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
>>>>>>     *Add the necessary code to incorporate the "nopat" fix
>>>>>>     *void init_cache_modes(void) -> void __init init_cache_modes(void)
>>>>>>     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
>>>>>>     *Expand the commit message to include relevant parts of the commit
>>>>>>      message of Jan Beulich's proposed patch for this problem
>>>>>>     *Fix 'else if ... {' placement and indentation
>>>>>>     *Remove indication the backport to stable branches is only back to 5.17.y
>>>>>>
>>>>>> I think these changes address all the comments on the original patch
>>>>>>
>>>>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
>>>>>> include Jan's idea for fixing "nopat" that was missing from the first
>>>>>> version of the patch.
>>>>>
>>>>> You've sufficiently altered this change to clearly no longer want my
>>>>> S-o-b; unfortunately in fact I think you broke things:
>>>>
>>>> Well, I hope we can come to an agreement so I have
>>>> your S-o-b. But that would probably require me to remove
>>>> Juergen's R-b.
>>>>
>>>>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
>>>>>>  		rdmsrl(MSR_IA32_CR_PAT, pat);
>>>>>>  	}
>>>>>>  
>>>>>> -	if (!pat) {
>>>>>> +	if (!pat || pat_force_disabled) {
>>>>>
>>>>> By checking the new variable here ...
>>>>>
>>>>>>  		/*
>>>>>>  		 * No PAT. Emulate the PAT table that corresponds to the two
>>>>>>  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>>>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
>>>>>>  		 */
>>>>>>  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>>>>>  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>>>>
>>>>> ... you put in place a software view which doesn't match hardware. I
>>>>> continue to think that ...
>>>>>
>>>>>> +	} else if (!pat_bp_enabled) {
>>>>>
>>>>> ... the variable wants checking here instead (at which point, yes,
>>>>> this comes quite close to simply being a v2 of my original patch).
>>>>>
>>>>> By using !pat_bp_enabled here you actually broaden where the change
>>>>> would take effect. Iirc Boris had asked to narrow things (besides
>>>>> voicing opposition to this approach altogether). Even without that
>>>>> request I wonder whether you aren't going to far with this.
>>>>>
>>>>> Jan
>>>>
>>>> I thought about checking for the administrator's "nopat"
>>>> setting where you suggest which would limit the effect
>>>> of "nopat" to not reporting PAT as enabled to device
>>>> drivers who query for PAT availability using pat_enabled().
>>>> The main reason I did not do that is that due to the fact
>>>> that we cannot write to the PAT MSR, we cannot really
>>>> disable PAT. But we come closer to respecting the wishes
>>>> of the administrator by configuring the caching modes as
>>>> if PAT is actually disabled by the hardware or firmware
>>>> when in fact it is not.
>>>>
>>>> What would you propose logging as a message when
>>>> we report PAT as disabled via pat_enabled()? The main
>>>> reason I did not choose to check the new variable in the
>>>> new 'else if' block is that I could not figure out what to
>>>> tell the administrator in that case. I think we would have
>>>> to log something like, "nopat is set, but we cannot disable
>>>> PAT, doing our best to disable PAT by not reporting PAT
>>>> as enabled via pat_enabled(), but that does not guarantee
>>>> that kernel drivers and components cannot use PAT if they
>>>> query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
>>>> instead of pat_enabled()." However, I acknowledge WC mappings
>>>> would still be disabled because arch_can_pci_mmap_wc() will
>>>> be false if pat_enabled() is false.
>>>>
>>>> Perhaps we also need to log something if we keep the
>>>> check for "nopat" where I placed it. We could say something
>>>> like: "nopat is set, but we cannot disable hardware/firmware
>>>> PAT support, so we are emulating as if there is no PAT support
>>>> which puts in place a software view that does not match
>>>> hardware."
>>>>
>>>> No matter what, because we cannot write to PAT MSR in
>>>> the Xen PV case, we probably need to log something to
>>>> explain the problems associated with trying to honor the
>>>> administrator's request. Also, what log level should it be.
>>>> Should it be a pr_warn instead of a pr_info?
>>>
>>> I'm afraid I'm the wrong one to answer logging questions. As you
>>> can see from my original patch, I didn't add any new logging (and
>>> no addition was requested in the comments that I have got). I also
>>> don't think "nopat" has ever meant "disable PAT", as the feature
>>> is either there or not. Instead I think it was always seen as
>>> "disable fiddling with PAT", which by implication means using
>>> whatever is there (if the feature / MSR itself is available).
>>
>> IIRC, I do think I mentioned in the comments on your patch that
>> it would be preferable to mention in the commit message that
>> your patch would change the current behavior of "nopat" on
>> Xen. The question is, how much do we want to change the
>> current behavior of "nopat" on Xen. I think if we have to change
>> the current behavior of "nopat" on Xen and if we are going
>> to propagate that change to all current stable branches all
>> the way back to 4.9.y,, we better make a lot of noise about
>> what we are doing here.
>>
>> Chuck
> 
> And in addition, if we are going to backport this patch to
> all current stable branches, we better have a really, really,
> good reason for changing the behavior of "nopat" on Xen.
> 
> Does such a reason exist?

Well, the simple reason is: It doesn't work the same way under Xen
and non-Xen (in turn because, before my patch or whatever equivalent
work, things don't work properly anyway, PAT-wise). Yet it definitely
ought to behave the same everywhere, imo.

Jan

> Or perhaps, Juergen, could you
> accept a v3 of my patch that does not need to decide
> how to backport the change to "nopat" to the stable branches
> that are affected by the current behavior of "nopat" on Xen?
> 
> To do such a v3, I would just have to fix the style problems
> with my original patch and not come to an agreement with
> Jan about how to deal with the "nopat" problem.
> 
> Chuck
> 
> Chuck
Jürgen Groß July 13, 2022, 1:45 p.m. UTC | #8
On 13.07.22 15:34, Jan Beulich wrote:
> On 13.07.2022 13:10, Chuck Zmudzinski wrote:
>> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
>>> On 7/13/2022 5:09 AM, Jan Beulich wrote:
>>>> On 13.07.2022 10:51, Chuck Zmudzinski wrote:
>>>>> On 7/13/22 2:18 AM, Jan Beulich wrote:
>>>>>> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
>>>>>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
>>>>>>>      *Add the necessary code to incorporate the "nopat" fix
>>>>>>>      *void init_cache_modes(void) -> void __init init_cache_modes(void)
>>>>>>>      *Add Jan Beulich as Co-developer (Jan has not signed off yet)
>>>>>>>      *Expand the commit message to include relevant parts of the commit
>>>>>>>       message of Jan Beulich's proposed patch for this problem
>>>>>>>      *Fix 'else if ... {' placement and indentation
>>>>>>>      *Remove indication the backport to stable branches is only back to 5.17.y
>>>>>>>
>>>>>>> I think these changes address all the comments on the original patch
>>>>>>>
>>>>>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
>>>>>>> include Jan's idea for fixing "nopat" that was missing from the first
>>>>>>> version of the patch.
>>>>>>
>>>>>> You've sufficiently altered this change to clearly no longer want my
>>>>>> S-o-b; unfortunately in fact I think you broke things:
>>>>>
>>>>> Well, I hope we can come to an agreement so I have
>>>>> your S-o-b. But that would probably require me to remove
>>>>> Juergen's R-b.
>>>>>
>>>>>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
>>>>>>>   		rdmsrl(MSR_IA32_CR_PAT, pat);
>>>>>>>   	}
>>>>>>>   
>>>>>>> -	if (!pat) {
>>>>>>> +	if (!pat || pat_force_disabled) {
>>>>>>
>>>>>> By checking the new variable here ...
>>>>>>
>>>>>>>   		/*
>>>>>>>   		 * No PAT. Emulate the PAT table that corresponds to the two
>>>>>>>   		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>>>>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
>>>>>>>   		 */
>>>>>>>   		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>>>>>>   		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>>>>>
>>>>>> ... you put in place a software view which doesn't match hardware. I
>>>>>> continue to think that ...
>>>>>>
>>>>>>> +	} else if (!pat_bp_enabled) {
>>>>>>
>>>>>> ... the variable wants checking here instead (at which point, yes,
>>>>>> this comes quite close to simply being a v2 of my original patch).
>>>>>>
>>>>>> By using !pat_bp_enabled here you actually broaden where the change
>>>>>> would take effect. Iirc Boris had asked to narrow things (besides
>>>>>> voicing opposition to this approach altogether). Even without that
>>>>>> request I wonder whether you aren't going to far with this.
>>>>>>
>>>>>> Jan
>>>>>
>>>>> I thought about checking for the administrator's "nopat"
>>>>> setting where you suggest which would limit the effect
>>>>> of "nopat" to not reporting PAT as enabled to device
>>>>> drivers who query for PAT availability using pat_enabled().
>>>>> The main reason I did not do that is that due to the fact
>>>>> that we cannot write to the PAT MSR, we cannot really
>>>>> disable PAT. But we come closer to respecting the wishes
>>>>> of the administrator by configuring the caching modes as
>>>>> if PAT is actually disabled by the hardware or firmware
>>>>> when in fact it is not.
>>>>>
>>>>> What would you propose logging as a message when
>>>>> we report PAT as disabled via pat_enabled()? The main
>>>>> reason I did not choose to check the new variable in the
>>>>> new 'else if' block is that I could not figure out what to
>>>>> tell the administrator in that case. I think we would have
>>>>> to log something like, "nopat is set, but we cannot disable
>>>>> PAT, doing our best to disable PAT by not reporting PAT
>>>>> as enabled via pat_enabled(), but that does not guarantee
>>>>> that kernel drivers and components cannot use PAT if they
>>>>> query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
>>>>> instead of pat_enabled()." However, I acknowledge WC mappings
>>>>> would still be disabled because arch_can_pci_mmap_wc() will
>>>>> be false if pat_enabled() is false.
>>>>>
>>>>> Perhaps we also need to log something if we keep the
>>>>> check for "nopat" where I placed it. We could say something
>>>>> like: "nopat is set, but we cannot disable hardware/firmware
>>>>> PAT support, so we are emulating as if there is no PAT support
>>>>> which puts in place a software view that does not match
>>>>> hardware."
>>>>>
>>>>> No matter what, because we cannot write to PAT MSR in
>>>>> the Xen PV case, we probably need to log something to
>>>>> explain the problems associated with trying to honor the
>>>>> administrator's request. Also, what log level should it be.
>>>>> Should it be a pr_warn instead of a pr_info?
>>>>
>>>> I'm afraid I'm the wrong one to answer logging questions. As you
>>>> can see from my original patch, I didn't add any new logging (and
>>>> no addition was requested in the comments that I have got). I also
>>>> don't think "nopat" has ever meant "disable PAT", as the feature
>>>> is either there or not. Instead I think it was always seen as
>>>> "disable fiddling with PAT", which by implication means using
>>>> whatever is there (if the feature / MSR itself is available).
>>>
>>> IIRC, I do think I mentioned in the comments on your patch that
>>> it would be preferable to mention in the commit message that
>>> your patch would change the current behavior of "nopat" on
>>> Xen. The question is, how much do we want to change the
>>> current behavior of "nopat" on Xen. I think if we have to change
>>> the current behavior of "nopat" on Xen and if we are going
>>> to propagate that change to all current stable branches all
>>> the way back to 4.9.y,, we better make a lot of noise about
>>> what we are doing here.
>>>
>>> Chuck
>>
>> And in addition, if we are going to backport this patch to
>> all current stable branches, we better have a really, really,
>> good reason for changing the behavior of "nopat" on Xen.
>>
>> Does such a reason exist?
> 
> Well, the simple reason is: It doesn't work the same way under Xen
> and non-Xen (in turn because, before my patch or whatever equivalent
> work, things don't work properly anyway, PAT-wise). Yet it definitely
> ought to behave the same everywhere, imo.

There is Documentation/x86/pat.rst which rather clearly states, how
"nopat" is meant to work. It should not change the contents of the
PAT MSR and keep it just as it was set at boot time (the doc talks
about the "BIOS" setting of the MSR, and I guess in the Xen case
the hypervisor is kind of acting as the BIOS).

The question is, whether "nopat" needs to be translated to
pat_enabled() returning "false". I've found exactly one case where
"nopat" seems to be required for a driver to work correctly, but
this driver (drivers/media/pci/ivtv/ivtvfb.c) seems to rely on
MTRR availability, which isn't supported in Xen PV guests.

Other than that the "nopat" option seems to be a chicken bit from
the early days of PAT usage, which probably isn't really needed any
more. So I wouldn't be worried to drop "nopat" having any effect
on the system in Xen PV guests.

On bare metal it should still work, of course, if only for said
driver.

>> Or perhaps, Juergen, could you
>> accept a v3 of my patch that does not need to decide
>> how to backport the change to "nopat" to the stable branches
>> that are affected by the current behavior of "nopat" on Xen?

Note that it is not me to accept such a patch, this would be one
of the x86 maintainers (e.g. Boris).


Juergen
Chuck Zmudzinski July 13, 2022, 1:49 p.m. UTC | #9
On 7/13/2022 9:34 AM, Jan Beulich wrote:
> On 13.07.2022 13:10, Chuck Zmudzinski wrote:
> > On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> >> On 7/13/2022 5:09 AM, Jan Beulich wrote:
> >>> On 13.07.2022 10:51, Chuck Zmudzinski wrote:
> >>>> On 7/13/22 2:18 AM, Jan Beulich wrote:
> >>>>> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> >>>>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> >>>>>>     *Add the necessary code to incorporate the "nopat" fix
> >>>>>>     *void init_cache_modes(void) -> void __init init_cache_modes(void)
> >>>>>>     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> >>>>>>     *Expand the commit message to include relevant parts of the commit
> >>>>>>      message of Jan Beulich's proposed patch for this problem
> >>>>>>     *Fix 'else if ... {' placement and indentation
> >>>>>>     *Remove indication the backport to stable branches is only back to 5.17.y
> >>>>>>
> >>>>>> I think these changes address all the comments on the original patch
> >>>>>>
> >>>>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> >>>>>> include Jan's idea for fixing "nopat" that was missing from the first
> >>>>>> version of the patch.
> >>>>>
> >>>>> You've sufficiently altered this change to clearly no longer want my
> >>>>> S-o-b; unfortunately in fact I think you broke things:
> >>>>
> >>>> Well, I hope we can come to an agreement so I have
> >>>> your S-o-b. But that would probably require me to remove
> >>>> Juergen's R-b.
> >>>>
> >>>>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
> >>>>>>  		rdmsrl(MSR_IA32_CR_PAT, pat);
> >>>>>>  	}
> >>>>>>  
> >>>>>> -	if (!pat) {
> >>>>>> +	if (!pat || pat_force_disabled) {
> >>>>>
> >>>>> By checking the new variable here ...
> >>>>>
> >>>>>>  		/*
> >>>>>>  		 * No PAT. Emulate the PAT table that corresponds to the two
> >>>>>>  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >>>>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
> >>>>>>  		 */
> >>>>>>  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >>>>>>  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >>>>>
> >>>>> ... you put in place a software view which doesn't match hardware. I
> >>>>> continue to think that ...
> >>>>>
> >>>>>> +	} else if (!pat_bp_enabled) {
> >>>>>
> >>>>> ... the variable wants checking here instead (at which point, yes,
> >>>>> this comes quite close to simply being a v2 of my original patch).
> >>>>>
> >>>>> By using !pat_bp_enabled here you actually broaden where the change
> >>>>> would take effect. Iirc Boris had asked to narrow things (besides
> >>>>> voicing opposition to this approach altogether). Even without that
> >>>>> request I wonder whether you aren't going to far with this.
> >>>>>
> >>>>> Jan
> >>>>
> >>>> I thought about checking for the administrator's "nopat"
> >>>> setting where you suggest which would limit the effect
> >>>> of "nopat" to not reporting PAT as enabled to device
> >>>> drivers who query for PAT availability using pat_enabled().
> >>>> The main reason I did not do that is that due to the fact
> >>>> that we cannot write to the PAT MSR, we cannot really
> >>>> disable PAT. But we come closer to respecting the wishes
> >>>> of the administrator by configuring the caching modes as
> >>>> if PAT is actually disabled by the hardware or firmware
> >>>> when in fact it is not.
> >>>>
> >>>> What would you propose logging as a message when
> >>>> we report PAT as disabled via pat_enabled()? The main
> >>>> reason I did not choose to check the new variable in the
> >>>> new 'else if' block is that I could not figure out what to
> >>>> tell the administrator in that case. I think we would have
> >>>> to log something like, "nopat is set, but we cannot disable
> >>>> PAT, doing our best to disable PAT by not reporting PAT
> >>>> as enabled via pat_enabled(), but that does not guarantee
> >>>> that kernel drivers and components cannot use PAT if they
> >>>> query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
> >>>> instead of pat_enabled()." However, I acknowledge WC mappings
> >>>> would still be disabled because arch_can_pci_mmap_wc() will
> >>>> be false if pat_enabled() is false.
> >>>>
> >>>> Perhaps we also need to log something if we keep the
> >>>> check for "nopat" where I placed it. We could say something
> >>>> like: "nopat is set, but we cannot disable hardware/firmware
> >>>> PAT support, so we are emulating as if there is no PAT support
> >>>> which puts in place a software view that does not match
> >>>> hardware."
> >>>>
> >>>> No matter what, because we cannot write to PAT MSR in
> >>>> the Xen PV case, we probably need to log something to
> >>>> explain the problems associated with trying to honor the
> >>>> administrator's request. Also, what log level should it be.
> >>>> Should it be a pr_warn instead of a pr_info?
> >>>
> >>> I'm afraid I'm the wrong one to answer logging questions. As you
> >>> can see from my original patch, I didn't add any new logging (and
> >>> no addition was requested in the comments that I have got). I also
> >>> don't think "nopat" has ever meant "disable PAT", as the feature
> >>> is either there or not. Instead I think it was always seen as
> >>> "disable fiddling with PAT", which by implication means using
> >>> whatever is there (if the feature / MSR itself is available).
> >>
> >> IIRC, I do think I mentioned in the comments on your patch that
> >> it would be preferable to mention in the commit message that
> >> your patch would change the current behavior of "nopat" on
> >> Xen. The question is, how much do we want to change the
> >> current behavior of "nopat" on Xen. I think if we have to change
> >> the current behavior of "nopat" on Xen and if we are going
> >> to propagate that change to all current stable branches all
> >> the way back to 4.9.y,, we better make a lot of noise about
> >> what we are doing here.
> >>
> >> Chuck
> > 
> > And in addition, if we are going to backport this patch to
> > all current stable branches, we better have a really, really,
> > good reason for changing the behavior of "nopat" on Xen.
> > 
> > Does such a reason exist?
>
> Well, the simple reason is: It doesn't work the same way under Xen
> and non-Xen (in turn because, before my patch or whatever equivalent
> work, things don't work properly anyway, PAT-wise). Yet it definitely
> ought to behave the same everywhere, imo.
>
> Jan

IOW, you are saying PAT has been broken on Xen for a
long time, and it is necessary to fix it now not only on
master, but also on all the stable branches.

Why is it necessary to do it on all the stable branches?

The only valid reason I can think of is a zero-day exploit
that can only be mitigated by really disabling PAT on Xen.

Chuck
Jan Beulich July 13, 2022, 1:52 p.m. UTC | #10
On 13.07.2022 15:49, Chuck Zmudzinski wrote:
> On 7/13/2022 9:34 AM, Jan Beulich wrote:
>> On 13.07.2022 13:10, Chuck Zmudzinski wrote:
>>> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
>>>> On 7/13/2022 5:09 AM, Jan Beulich wrote:
>>>>> On 13.07.2022 10:51, Chuck Zmudzinski wrote:
>>>>>> On 7/13/22 2:18 AM, Jan Beulich wrote:
>>>>>>> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
>>>>>>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
>>>>>>>>     *Add the necessary code to incorporate the "nopat" fix
>>>>>>>>     *void init_cache_modes(void) -> void __init init_cache_modes(void)
>>>>>>>>     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
>>>>>>>>     *Expand the commit message to include relevant parts of the commit
>>>>>>>>      message of Jan Beulich's proposed patch for this problem
>>>>>>>>     *Fix 'else if ... {' placement and indentation
>>>>>>>>     *Remove indication the backport to stable branches is only back to 5.17.y
>>>>>>>>
>>>>>>>> I think these changes address all the comments on the original patch
>>>>>>>>
>>>>>>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
>>>>>>>> include Jan's idea for fixing "nopat" that was missing from the first
>>>>>>>> version of the patch.
>>>>>>>
>>>>>>> You've sufficiently altered this change to clearly no longer want my
>>>>>>> S-o-b; unfortunately in fact I think you broke things:
>>>>>>
>>>>>> Well, I hope we can come to an agreement so I have
>>>>>> your S-o-b. But that would probably require me to remove
>>>>>> Juergen's R-b.
>>>>>>
>>>>>>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
>>>>>>>>  		rdmsrl(MSR_IA32_CR_PAT, pat);
>>>>>>>>  	}
>>>>>>>>  
>>>>>>>> -	if (!pat) {
>>>>>>>> +	if (!pat || pat_force_disabled) {
>>>>>>>
>>>>>>> By checking the new variable here ...
>>>>>>>
>>>>>>>>  		/*
>>>>>>>>  		 * No PAT. Emulate the PAT table that corresponds to the two
>>>>>>>>  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>>>>>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
>>>>>>>>  		 */
>>>>>>>>  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>>>>>>>  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>>>>>>
>>>>>>> ... you put in place a software view which doesn't match hardware. I
>>>>>>> continue to think that ...
>>>>>>>
>>>>>>>> +	} else if (!pat_bp_enabled) {
>>>>>>>
>>>>>>> ... the variable wants checking here instead (at which point, yes,
>>>>>>> this comes quite close to simply being a v2 of my original patch).
>>>>>>>
>>>>>>> By using !pat_bp_enabled here you actually broaden where the change
>>>>>>> would take effect. Iirc Boris had asked to narrow things (besides
>>>>>>> voicing opposition to this approach altogether). Even without that
>>>>>>> request I wonder whether you aren't going to far with this.
>>>>>>>
>>>>>>> Jan
>>>>>>
>>>>>> I thought about checking for the administrator's "nopat"
>>>>>> setting where you suggest which would limit the effect
>>>>>> of "nopat" to not reporting PAT as enabled to device
>>>>>> drivers who query for PAT availability using pat_enabled().
>>>>>> The main reason I did not do that is that due to the fact
>>>>>> that we cannot write to the PAT MSR, we cannot really
>>>>>> disable PAT. But we come closer to respecting the wishes
>>>>>> of the administrator by configuring the caching modes as
>>>>>> if PAT is actually disabled by the hardware or firmware
>>>>>> when in fact it is not.
>>>>>>
>>>>>> What would you propose logging as a message when
>>>>>> we report PAT as disabled via pat_enabled()? The main
>>>>>> reason I did not choose to check the new variable in the
>>>>>> new 'else if' block is that I could not figure out what to
>>>>>> tell the administrator in that case. I think we would have
>>>>>> to log something like, "nopat is set, but we cannot disable
>>>>>> PAT, doing our best to disable PAT by not reporting PAT
>>>>>> as enabled via pat_enabled(), but that does not guarantee
>>>>>> that kernel drivers and components cannot use PAT if they
>>>>>> query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
>>>>>> instead of pat_enabled()." However, I acknowledge WC mappings
>>>>>> would still be disabled because arch_can_pci_mmap_wc() will
>>>>>> be false if pat_enabled() is false.
>>>>>>
>>>>>> Perhaps we also need to log something if we keep the
>>>>>> check for "nopat" where I placed it. We could say something
>>>>>> like: "nopat is set, but we cannot disable hardware/firmware
>>>>>> PAT support, so we are emulating as if there is no PAT support
>>>>>> which puts in place a software view that does not match
>>>>>> hardware."
>>>>>>
>>>>>> No matter what, because we cannot write to PAT MSR in
>>>>>> the Xen PV case, we probably need to log something to
>>>>>> explain the problems associated with trying to honor the
>>>>>> administrator's request. Also, what log level should it be.
>>>>>> Should it be a pr_warn instead of a pr_info?
>>>>>
>>>>> I'm afraid I'm the wrong one to answer logging questions. As you
>>>>> can see from my original patch, I didn't add any new logging (and
>>>>> no addition was requested in the comments that I have got). I also
>>>>> don't think "nopat" has ever meant "disable PAT", as the feature
>>>>> is either there or not. Instead I think it was always seen as
>>>>> "disable fiddling with PAT", which by implication means using
>>>>> whatever is there (if the feature / MSR itself is available).
>>>>
>>>> IIRC, I do think I mentioned in the comments on your patch that
>>>> it would be preferable to mention in the commit message that
>>>> your patch would change the current behavior of "nopat" on
>>>> Xen. The question is, how much do we want to change the
>>>> current behavior of "nopat" on Xen. I think if we have to change
>>>> the current behavior of "nopat" on Xen and if we are going
>>>> to propagate that change to all current stable branches all
>>>> the way back to 4.9.y,, we better make a lot of noise about
>>>> what we are doing here.
>>>>
>>>> Chuck
>>>
>>> And in addition, if we are going to backport this patch to
>>> all current stable branches, we better have a really, really,
>>> good reason for changing the behavior of "nopat" on Xen.
>>>
>>> Does such a reason exist?
>>
>> Well, the simple reason is: It doesn't work the same way under Xen
>> and non-Xen (in turn because, before my patch or whatever equivalent
>> work, things don't work properly anyway, PAT-wise). Yet it definitely
>> ought to behave the same everywhere, imo.
>>
>> Jan
> 
> IOW, you are saying PAT has been broken on Xen for a
> long time, and it is necessary to fix it now not only on
> master, but also on all the stable branches.
> 
> Why is it necessary to do it on all the stable branches?

I'm not saying that's _necessary_ (but I think it would make sense),
and I'm not the one to decide whether or how far to backport this.

Jan
Chuck Zmudzinski July 13, 2022, 3:02 p.m. UTC | #11
On 7/13/2022 9:52 AM, Jan Beulich wrote:
> On 13.07.2022 15:49, Chuck Zmudzinski wrote:
> > On 7/13/2022 9:34 AM, Jan Beulich wrote:
> >> On 13.07.2022 13:10, Chuck Zmudzinski wrote:
> >>> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> >>>> On 7/13/2022 5:09 AM, Jan Beulich wrote:
> >>>>> On 13.07.2022 10:51, Chuck Zmudzinski wrote:
> >>>>>> On 7/13/22 2:18 AM, Jan Beulich wrote:
> >>>>>>> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> >>>>>>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> >>>>>>>>     *Add the necessary code to incorporate the "nopat" fix
> >>>>>>>>     *void init_cache_modes(void) -> void __init init_cache_modes(void)
> >>>>>>>>     *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> >>>>>>>>     *Expand the commit message to include relevant parts of the commit
> >>>>>>>>      message of Jan Beulich's proposed patch for this problem
> >>>>>>>>     *Fix 'else if ... {' placement and indentation
> >>>>>>>>     *Remove indication the backport to stable branches is only back to 5.17.y
> >>>>>>>>
> >>>>>>>> I think these changes address all the comments on the original patch
> >>>>>>>>
> >>>>>>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> >>>>>>>> include Jan's idea for fixing "nopat" that was missing from the first
> >>>>>>>> version of the patch.
> >>>>>>>
> >>>>>>> You've sufficiently altered this change to clearly no longer want my
> >>>>>>> S-o-b; unfortunately in fact I think you broke things:
> >>>>>>
> >>>>>> Well, I hope we can come to an agreement so I have
> >>>>>> your S-o-b. But that would probably require me to remove
> >>>>>> Juergen's R-b.
> >>>>>>
> >>>>>>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
> >>>>>>>>  		rdmsrl(MSR_IA32_CR_PAT, pat);
> >>>>>>>>  	}
> >>>>>>>>  
> >>>>>>>> -	if (!pat) {
> >>>>>>>> +	if (!pat || pat_force_disabled) {
> >>>>>>>
> >>>>>>> By checking the new variable here ...
> >>>>>>>
> >>>>>>>>  		/*
> >>>>>>>>  		 * No PAT. Emulate the PAT table that corresponds to the two
> >>>>>>>>  		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >>>>>>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
> >>>>>>>>  		 */
> >>>>>>>>  		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >>>>>>>>  		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >>>>>>>
> >>>>>>> ... you put in place a software view which doesn't match hardware. I
> >>>>>>> continue to think that ...
> >>>>>>>
> >>>>>>>> +	} else if (!pat_bp_enabled) {
> >>>>>>>
> >>>>>>> ... the variable wants checking here instead (at which point, yes,
> >>>>>>> this comes quite close to simply being a v2 of my original patch).
> >>>>>>>
> >>>>>>> By using !pat_bp_enabled here you actually broaden where the change
> >>>>>>> would take effect. Iirc Boris had asked to narrow things (besides
> >>>>>>> voicing opposition to this approach altogether). Even without that
> >>>>>>> request I wonder whether you aren't going to far with this.
> >>>>>>>
> >>>>>>> Jan
> >>>>>>
> >>>>>> I thought about checking for the administrator's "nopat"
> >>>>>> setting where you suggest which would limit the effect
> >>>>>> of "nopat" to not reporting PAT as enabled to device
> >>>>>> drivers who query for PAT availability using pat_enabled().
> >>>>>> The main reason I did not do that is that due to the fact
> >>>>>> that we cannot write to the PAT MSR, we cannot really
> >>>>>> disable PAT. But we come closer to respecting the wishes
> >>>>>> of the administrator by configuring the caching modes as
> >>>>>> if PAT is actually disabled by the hardware or firmware
> >>>>>> when in fact it is not.
> >>>>>>
> >>>>>> What would you propose logging as a message when
> >>>>>> we report PAT as disabled via pat_enabled()? The main
> >>>>>> reason I did not choose to check the new variable in the
> >>>>>> new 'else if' block is that I could not figure out what to
> >>>>>> tell the administrator in that case. I think we would have
> >>>>>> to log something like, "nopat is set, but we cannot disable
> >>>>>> PAT, doing our best to disable PAT by not reporting PAT
> >>>>>> as enabled via pat_enabled(), but that does not guarantee
> >>>>>> that kernel drivers and components cannot use PAT if they
> >>>>>> query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
> >>>>>> instead of pat_enabled()." However, I acknowledge WC mappings
> >>>>>> would still be disabled because arch_can_pci_mmap_wc() will
> >>>>>> be false if pat_enabled() is false.
> >>>>>>
> >>>>>> Perhaps we also need to log something if we keep the
> >>>>>> check for "nopat" where I placed it. We could say something
> >>>>>> like: "nopat is set, but we cannot disable hardware/firmware
> >>>>>> PAT support, so we are emulating as if there is no PAT support
> >>>>>> which puts in place a software view that does not match
> >>>>>> hardware."
> >>>>>>
> >>>>>> No matter what, because we cannot write to PAT MSR in
> >>>>>> the Xen PV case, we probably need to log something to
> >>>>>> explain the problems associated with trying to honor the
> >>>>>> administrator's request. Also, what log level should it be.
> >>>>>> Should it be a pr_warn instead of a pr_info?
> >>>>>
> >>>>> I'm afraid I'm the wrong one to answer logging questions. As you
> >>>>> can see from my original patch, I didn't add any new logging (and
> >>>>> no addition was requested in the comments that I have got). I also
> >>>>> don't think "nopat" has ever meant "disable PAT", as the feature
> >>>>> is either there or not. Instead I think it was always seen as
> >>>>> "disable fiddling with PAT", which by implication means using
> >>>>> whatever is there (if the feature / MSR itself is available).
> >>>>
> >>>> IIRC, I do think I mentioned in the comments on your patch that
> >>>> it would be preferable to mention in the commit message that
> >>>> your patch would change the current behavior of "nopat" on
> >>>> Xen. The question is, how much do we want to change the
> >>>> current behavior of "nopat" on Xen. I think if we have to change
> >>>> the current behavior of "nopat" on Xen and if we are going
> >>>> to propagate that change to all current stable branches all
> >>>> the way back to 4.9.y,, we better make a lot of noise about
> >>>> what we are doing here.
> >>>>
> >>>> Chuck
> >>>
> >>> And in addition, if we are going to backport this patch to
> >>> all current stable branches, we better have a really, really,
> >>> good reason for changing the behavior of "nopat" on Xen.
> >>>
> >>> Does such a reason exist?
> >>
> >> Well, the simple reason is: It doesn't work the same way under Xen
> >> and non-Xen (in turn because, before my patch or whatever equivalent
> >> work, things don't work properly anyway, PAT-wise). Yet it definitely
> >> ought to behave the same everywhere, imo.
> >>
> >> Jan
> > 
> > IOW, you are saying PAT has been broken on Xen for a
> > long time, and it is necessary to fix it now not only on
> > master, but also on all the stable branches.
> > 
> > Why is it necessary to do it on all the stable branches?
>
> I'm not saying that's _necessary_ (but I think it would make sense),
> and I'm not the one to decide whether or how far to backport this.
>
> Jan

What conclusion do you draw from these facts?

1. Linus' regression rule is a high priority in Linux
2. Security concerns are even a higher priority in Linux
3. You and I have been trying to fix a regression for the past two months
4. The ones who can fix the regression have not accepted our patches.
5. I have been asked to help backport my fix to all stable branches.

Chuck
Jan Beulich July 13, 2022, 3:05 p.m. UTC | #12
On 13.07.2022 17:02, Chuck Zmudzinski wrote:
> 5. I have been asked to help backport my fix to all stable branches.

Before anything can sensibly be backported, we need maintainer buyoff on
a patch. And then they may have an opinion how far this is reasonable to
backport.

Jan
Chuck Zmudzinski July 13, 2022, 5:01 p.m. UTC | #13
On 7/13/2022 9:45 AM, Juergen Gross wrote:
> On 13.07.22 15:34, Jan Beulich wrote:
> > On 13.07.2022 13:10, Chuck Zmudzinski wrote:
> >> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> >>> On 7/13/2022 5:09 AM, Jan Beulich wrote:
> >>>> On 13.07.2022 10:51, Chuck Zmudzinski wrote:
> >>>>> On 7/13/22 2:18 AM, Jan Beulich wrote:
> >>>>>> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> >>>>>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> >>>>>>>      *Add the necessary code to incorporate the "nopat" fix
> >>>>>>>      *void init_cache_modes(void) -> void __init init_cache_modes(void)
> >>>>>>>      *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> >>>>>>>      *Expand the commit message to include relevant parts of the commit
> >>>>>>>       message of Jan Beulich's proposed patch for this problem
> >>>>>>>      *Fix 'else if ... {' placement and indentation
> >>>>>>>      *Remove indication the backport to stable branches is only back to 5.17.y
> >>>>>>>
> >>>>>>> I think these changes address all the comments on the original patch
> >>>>>>>
> >>>>>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> >>>>>>> include Jan's idea for fixing "nopat" that was missing from the first
> >>>>>>> version of the patch.
> >>>>>>
> >>>>>> You've sufficiently altered this change to clearly no longer want my
> >>>>>> S-o-b; unfortunately in fact I think you broke things:
> >>>>>
> >>>>> Well, I hope we can come to an agreement so I have
> >>>>> your S-o-b. But that would probably require me to remove
> >>>>> Juergen's R-b.
> >>>>>
> >>>>>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
> >>>>>>>   		rdmsrl(MSR_IA32_CR_PAT, pat);
> >>>>>>>   	}
> >>>>>>>   
> >>>>>>> -	if (!pat) {
> >>>>>>> +	if (!pat || pat_force_disabled) {
> >>>>>>
> >>>>>> By checking the new variable here ...
> >>>>>>
> >>>>>>>   		/*
> >>>>>>>   		 * No PAT. Emulate the PAT table that corresponds to the two
> >>>>>>>   		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >>>>>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
> >>>>>>>   		 */
> >>>>>>>   		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >>>>>>>   		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >>>>>>
> >>>>>> ... you put in place a software view which doesn't match hardware. I
> >>>>>> continue to think that ...
> >>>>>>
> >>>>>>> +	} else if (!pat_bp_enabled) {
> >>>>>>
> >>>>>> ... the variable wants checking here instead (at which point, yes,
> >>>>>> this comes quite close to simply being a v2 of my original patch).
> >>>>>>
> >>>>>> By using !pat_bp_enabled here you actually broaden where the change
> >>>>>> would take effect. Iirc Boris had asked to narrow things (besides
> >>>>>> voicing opposition to this approach altogether). Even without that
> >>>>>> request I wonder whether you aren't going to far with this.
> >>>>>>
> >>>>>> Jan
> >>>>>
> >>>>> I thought about checking for the administrator's "nopat"
> >>>>> setting where you suggest which would limit the effect
> >>>>> of "nopat" to not reporting PAT as enabled to device
> >>>>> drivers who query for PAT availability using pat_enabled().
> >>>>> The main reason I did not do that is that due to the fact
> >>>>> that we cannot write to the PAT MSR, we cannot really
> >>>>> disable PAT. But we come closer to respecting the wishes
> >>>>> of the administrator by configuring the caching modes as
> >>>>> if PAT is actually disabled by the hardware or firmware
> >>>>> when in fact it is not.
> >>>>>
> >>>>> What would you propose logging as a message when
> >>>>> we report PAT as disabled via pat_enabled()? The main
> >>>>> reason I did not choose to check the new variable in the
> >>>>> new 'else if' block is that I could not figure out what to
> >>>>> tell the administrator in that case. I think we would have
> >>>>> to log something like, "nopat is set, but we cannot disable
> >>>>> PAT, doing our best to disable PAT by not reporting PAT
> >>>>> as enabled via pat_enabled(), but that does not guarantee
> >>>>> that kernel drivers and components cannot use PAT if they
> >>>>> query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
> >>>>> instead of pat_enabled()." However, I acknowledge WC mappings
> >>>>> would still be disabled because arch_can_pci_mmap_wc() will
> >>>>> be false if pat_enabled() is false.
> >>>>>
> >>>>> Perhaps we also need to log something if we keep the
> >>>>> check for "nopat" where I placed it. We could say something
> >>>>> like: "nopat is set, but we cannot disable hardware/firmware
> >>>>> PAT support, so we are emulating as if there is no PAT support
> >>>>> which puts in place a software view that does not match
> >>>>> hardware."
> >>>>>
> >>>>> No matter what, because we cannot write to PAT MSR in
> >>>>> the Xen PV case, we probably need to log something to
> >>>>> explain the problems associated with trying to honor the
> >>>>> administrator's request. Also, what log level should it be.
> >>>>> Should it be a pr_warn instead of a pr_info?
> >>>>
> >>>> I'm afraid I'm the wrong one to answer logging questions. As you
> >>>> can see from my original patch, I didn't add any new logging (and
> >>>> no addition was requested in the comments that I have got). I also
> >>>> don't think "nopat" has ever meant "disable PAT", as the feature
> >>>> is either there or not. Instead I think it was always seen as
> >>>> "disable fiddling with PAT", which by implication means using
> >>>> whatever is there (if the feature / MSR itself is available).
> >>>
> >>> IIRC, I do think I mentioned in the comments on your patch that
> >>> it would be preferable to mention in the commit message that
> >>> your patch would change the current behavior of "nopat" on
> >>> Xen. The question is, how much do we want to change the
> >>> current behavior of "nopat" on Xen. I think if we have to change
> >>> the current behavior of "nopat" on Xen and if we are going
> >>> to propagate that change to all current stable branches all
> >>> the way back to 4.9.y,, we better make a lot of noise about
> >>> what we are doing here.
> >>>
> >>> Chuck
> >>
> >> And in addition, if we are going to backport this patch to
> >> all current stable branches, we better have a really, really,
> >> good reason for changing the behavior of "nopat" on Xen.
> >>
> >> Does such a reason exist?
> > 
> > Well, the simple reason is: It doesn't work the same way under Xen
> > and non-Xen (in turn because, before my patch or whatever equivalent
> > work, things don't work properly anyway, PAT-wise). Yet it definitely
> > ought to behave the same everywhere, imo.
>
> There is Documentation/x86/pat.rst which rather clearly states, how
> "nopat" is meant to work. It should not change the contents of the
> PAT MSR and keep it just as it was set at boot time (the doc talks
> about the "BIOS" setting of the MSR, and I guess in the Xen case
> the hypervisor is kind of acting as the BIOS).
>
> The question is, whether "nopat" needs to be translated to
> pat_enabled() returning "false". I've found exactly one case where
> "nopat" seems to be required for a driver to work correctly, but
> this driver (drivers/media/pci/ivtv/ivtvfb.c) seems to rely on
> MTRR availability, which isn't supported in Xen PV guests.
>
> Other than that the "nopat" option seems to be a chicken bit from
> the early days of PAT usage, which probably isn't really needed any
> more. So I wouldn't be worried to drop "nopat" having any effect
> on the system in Xen PV guests.
>
> On bare metal it should still work, of course, if only for said
> driver.
>
> >> Or perhaps, Juergen, could you
> >> accept a v3 of my patch that does not need to decide
> >> how to backport the change to "nopat" to the stable branches
> >> that are affected by the current behavior of "nopat" on Xen?
>
> Note that it is not me to accept such a patch, this would be one
> of the x86 maintainers (e.g. Boris).
>
>
> Juergen

Thanks, I think you have given me enough information to try a
v3 without a fix for "nopat" that you might be willing to give
your Reviewed-by to.

Unfortunately, I am not optimistic about coming to an agreement
with Jan. I do not want to sign-off on a patch that makes the
kind of changes to how "nopat" behaves with the attitude, "I'm not the
one to ask about logging." We have to give the administrator accurate
information about what we are doing with useful logs, and I
cannot sign-off on Jan's approach without giving the administrator
a sensible and honest message about what effects, or lack of effects,
the "nopat" option will have with his approach. Perhaps Jan will change
his mind and sign-off with me on a v3 that will explain with a log
message the limitations of the "nopat" option using his approach.

I am just against not being open about any issues that are present
in what we always call "open source software."

Chuck
Chuck Zmudzinski July 13, 2022, 7:07 p.m. UTC | #14
On 7/13/2022 9:45 AM, Juergen Gross wrote:
> >> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> >> And in addition, if we are going to backport this patch to
> >> all current stable branches, we better have a really, really,
> >> good reason for changing the behavior of "nopat" on Xen.
> >>
> >> Does such a reason exist?
> > 
> > Well, the simple reason is: It doesn't work the same way under Xen
> > and non-Xen (in turn because, before my patch or whatever equivalent
> > work, things don't work properly anyway, PAT-wise). Yet it definitely
> > ought to behave the same everywhere, imo.
>
> There is Documentation/x86/pat.rst which rather clearly states, how
> "nopat" is meant to work. It should not change the contents of the
> PAT MSR and keep it just as it was set at boot time (the doc talks
> about the "BIOS" setting of the MSR, and I guess in the Xen case
> the hypervisor is kind of acting as the BIOS).
>
> The question is, whether "nopat" needs to be translated to
> pat_enabled() returning "false".

When I started working on a re-factoring effort of the logic
surrounding pat_enabled(), I noticed there are five different
reasons in the current code for setting pat_disabled to true,
which IMO is what should be a redundant variable that should
always be equal !pat_enabled() and !pat_bp_enabled, but that
unfortunately is not the case. The five reasons for setting
pat_disabled to true are given as message strings:

1. "MTRRs disabled, skipping PAT initialization too."
2. "PAT support disabled because CONFIG_MTRR is disabled in the kernel."
3. "PAT support disabled via boot option."
4. "PAT not supported by the CPU."
5. "PAT support disabled by the firmware."

The only effect of setting pat_disabled to true is to inhibit
the execution of pat_init(), but it does not inhibit the execution
of init_cache_modes(), which is for handling all these cases
when pat_init() was skipped. The Xen case is one of those
cases, so in the Xen case, pat_disabled will be true yet the
only way to fix the current regression and the five-year-old
commit is by setting pat_bp_enabled to true so pat_enabled()
will return true. So to fix the five-year-old commit, we must have

pat_enabled() != pat_disabled

Something is wrong with this logic, that is why I wanted to precede
my fix with some re-factoring that will change some variable
and function names and modify some comments before trying
to fix the five-year-old commit, so that we will never have a situation
when pat_enabled() != pat_disabled.

Chuck
Chuck Zmudzinski July 13, 2022, 7:22 p.m. UTC | #15
On 7/13/2022 3:07 PM, Chuck Zmudzinski wrote:
> On 7/13/2022 9:45 AM, Juergen Gross wrote:
> > >> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> > >> And in addition, if we are going to backport this patch to
> > >> all current stable branches, we better have a really, really,
> > >> good reason for changing the behavior of "nopat" on Xen.
> > >>
> > >> Does such a reason exist?
> > > 
> > > Well, the simple reason is: It doesn't work the same way under Xen
> > > and non-Xen (in turn because, before my patch or whatever equivalent
> > > work, things don't work properly anyway, PAT-wise). Yet it definitely
> > > ought to behave the same everywhere, imo.
> >
> > There is Documentation/x86/pat.rst which rather clearly states, how
> > "nopat" is meant to work. It should not change the contents of the
> > PAT MSR and keep it just as it was set at boot time (the doc talks
> > about the "BIOS" setting of the MSR, and I guess in the Xen case
> > the hypervisor is kind of acting as the BIOS).
> >
> > The question is, whether "nopat" needs to be translated to
> > pat_enabled() returning "false".
>
> When I started working on a re-factoring effort of the logic
> surrounding pat_enabled(), I noticed there are five different
> reasons in the current code for setting pat_disabled to true,
> which IMO is what should be a redundant variable that should
> always be equal !pat_enabled() and !pat_bp_enabled, but that
> unfortunately is not the case. The five reasons for setting
> pat_disabled to true are given as message strings:
>
> 1. "MTRRs disabled, skipping PAT initialization too."
> 2. "PAT support disabled because CONFIG_MTRR is disabled in the kernel."
> 3. "PAT support disabled via boot option."
> 4. "PAT not supported by the CPU."
> 5. "PAT support disabled by the firmware."
>
> The only effect of setting pat_disabled to true is to inhibit
> the execution of pat_init(), but it does not inhibit the execution
> of init_cache_modes(), which is for handling all these cases
> when pat_init() was skipped. The Xen case is one of those
> cases, so in the Xen case, pat_disabled will be true yet the
> only way to fix the current regression and the five-year-old
> commit is by setting pat_bp_enabled to true so pat_enabled()
> will return true. So to fix the five-year-old commit, we must have
>
> pat_enabled() != pat_disabled
>
> Something is wrong with this logic, that is why I wanted to precede
> my fix with some re-factoring that will change some variable
> and function names and modify some comments before trying
> to fix the five-year-old commit, so that we will never have a situation
> when pat_enabled() != pat_disabled.
>
> Chuck
Sorry, I meant to say,

To fix the five-year-old commit, we must have

pat_enabled() != !pat_disabled or pat_enabled() == pat_disabled,

and there is something wrong with that logic.

Chuck
Chuck Zmudzinski July 13, 2022, 7:38 p.m. UTC | #16
On 7/13/2022 3:22 PM, Chuck Zmudzinski wrote:
> On 7/13/2022 3:07 PM, Chuck Zmudzinski wrote:
> > On 7/13/2022 9:45 AM, Juergen Gross wrote:
> > > >> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> > > >> And in addition, if we are going to backport this patch to
> > > >> all current stable branches, we better have a really, really,
> > > >> good reason for changing the behavior of "nopat" on Xen.
> > > >>
> > > >> Does such a reason exist?
> > > > 
> > > > Well, the simple reason is: It doesn't work the same way under Xen
> > > > and non-Xen (in turn because, before my patch or whatever equivalent
> > > > work, things don't work properly anyway, PAT-wise). Yet it definitely
> > > > ought to behave the same everywhere, imo.
> > >
> > > There is Documentation/x86/pat.rst which rather clearly states, how
> > > "nopat" is meant to work. It should not change the contents of the
> > > PAT MSR and keep it just as it was set at boot time (the doc talks
> > > about the "BIOS" setting of the MSR, and I guess in the Xen case
> > > the hypervisor is kind of acting as the BIOS).
> > >
> > > The question is, whether "nopat" needs to be translated to
> > > pat_enabled() returning "false".
> >
> > When I started working on a re-factoring effort of the logic
> > surrounding pat_enabled(), I noticed there are five different
> > reasons in the current code for setting pat_disabled to true,
> > which IMO is what should be a redundant variable that should
> > always be equal !pat_enabled() and !pat_bp_enabled, but that
> > unfortunately is not the case. The five reasons for setting
> > pat_disabled to true are given as message strings:
> >
> > 1. "MTRRs disabled, skipping PAT initialization too."
> > 2. "PAT support disabled because CONFIG_MTRR is disabled in the kernel."
> > 3. "PAT support disabled via boot option."
> > 4. "PAT not supported by the CPU."
> > 5. "PAT support disabled by the firmware."
> >
> > The only effect of setting pat_disabled to true is to inhibit
> > the execution of pat_init(), but it does not inhibit the execution
> > of init_cache_modes(), which is for handling all these cases
> > when pat_init() was skipped. The Xen case is one of those
> > cases, so in the Xen case, pat_disabled will be true yet the
> > only way to fix the current regression and the five-year-old
> > commit is by setting pat_bp_enabled to true so pat_enabled()
> > will return true. So to fix the five-year-old commit, we must have
> >
> > pat_enabled() != pat_disabled
> >
> > Something is wrong with this logic, that is why I wanted to precede
> > my fix with some re-factoring that will change some variable
> > and function names and modify some comments before trying
> > to fix the five-year-old commit, so that we will never have a situation
> > when pat_enabled() != pat_disabled.
> >
> > Chuck
> Sorry, I meant to say,
>
> To fix the five-year-old commit, we must have
>
> pat_enabled() != !pat_disabled or pat_enabled() == pat_disabled,
>
> and there is something wrong with that logic.
>
> Chuck

So to summarize, I think this means that to be comfortable
fixing the five-year-old commit and the current regression
by artificially setting pat_bp_enabled and pat_enabled() to
true, something which both my patch and Jan's patch does,
we need to come to a new understanding of what the
static boolean variable pat_disabled in
arch/x86/mm/pat/memtype.c in the code really means.

The fact is, we have a regression and the only fix we
can find is to try to make pat_enabled() == pat_disabled

I need to stop thinking about this for a while. It is time
for those who have authority to fix this regression to
make some comments about how they think this should
be fixed.

Chuck
Chuck Zmudzinski July 13, 2022, 10:12 p.m. UTC | #17
On 7/13/2022 9:45 AM, Juergen Gross wrote:
> On 13.07.22 15:34, Jan Beulich wrote:
> > On 13.07.2022 13:10, Chuck Zmudzinski wrote:
> >> On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
> >>> On 7/13/2022 5:09 AM, Jan Beulich wrote:
> >>>> On 13.07.2022 10:51, Chuck Zmudzinski wrote:
> >>>>> On 7/13/22 2:18 AM, Jan Beulich wrote:
> >>>>>> On 13.07.2022 03:36, Chuck Zmudzinski wrote:
> >>>>>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> >>>>>>>      *Add the necessary code to incorporate the "nopat" fix
> >>>>>>>      *void init_cache_modes(void) -> void __init init_cache_modes(void)
> >>>>>>>      *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> >>>>>>>      *Expand the commit message to include relevant parts of the commit
> >>>>>>>       message of Jan Beulich's proposed patch for this problem
> >>>>>>>      *Fix 'else if ... {' placement and indentation
> >>>>>>>      *Remove indication the backport to stable branches is only back to 5.17.y
> >>>>>>>
> >>>>>>> I think these changes address all the comments on the original patch
> >>>>>>>
> >>>>>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> >>>>>>> include Jan's idea for fixing "nopat" that was missing from the first
> >>>>>>> version of the patch.
> >>>>>>
> >>>>>> You've sufficiently altered this change to clearly no longer want my
> >>>>>> S-o-b; unfortunately in fact I think you broke things:
> >>>>>
> >>>>> Well, I hope we can come to an agreement so I have
> >>>>> your S-o-b. But that would probably require me to remove
> >>>>> Juergen's R-b.
> >>>>>
> >>>>>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
> >>>>>>>   		rdmsrl(MSR_IA32_CR_PAT, pat);
> >>>>>>>   	}
> >>>>>>>   
> >>>>>>> -	if (!pat) {
> >>>>>>> +	if (!pat || pat_force_disabled) {
> >>>>>>
> >>>>>> By checking the new variable here ...
> >>>>>>
> >>>>>>>   		/*
> >>>>>>>   		 * No PAT. Emulate the PAT table that corresponds to the two
> >>>>>>>   		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >>>>>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
> >>>>>>>   		 */
> >>>>>>>   		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >>>>>>>   		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >>>>>>
> >>>>>> ... you put in place a software view which doesn't match hardware. I
> >>>>>> continue to think that ...
> >>>>>>
> >>>>>>> +	} else if (!pat_bp_enabled) {
> >>>>>>
> >>>>>> ... the variable wants checking here instead (at which point, yes,
> >>>>>> this comes quite close to simply being a v2 of my original patch).
> >>>>>>
> >>>>>> By using !pat_bp_enabled here you actually broaden where the change
> >>>>>> would take effect. Iirc Boris had asked to narrow things (besides
> >>>>>> voicing opposition to this approach altogether). Even without that
> >>>>>> request I wonder whether you aren't going to far with this.
> >>>>>>
> >>>>>> Jan
> >>>>>
> >>>>> I thought about checking for the administrator's "nopat"
> >>>>> setting where you suggest which would limit the effect
> >>>>> of "nopat" to not reporting PAT as enabled to device
> >>>>> drivers who query for PAT availability using pat_enabled().
> >>>>> The main reason I did not do that is that due to the fact
> >>>>> that we cannot write to the PAT MSR, we cannot really
> >>>>> disable PAT. But we come closer to respecting the wishes
> >>>>> of the administrator by configuring the caching modes as
> >>>>> if PAT is actually disabled by the hardware or firmware
> >>>>> when in fact it is not.
> >>>>>
> >>>>> What would you propose logging as a message when
> >>>>> we report PAT as disabled via pat_enabled()? The main
> >>>>> reason I did not choose to check the new variable in the
> >>>>> new 'else if' block is that I could not figure out what to
> >>>>> tell the administrator in that case. I think we would have
> >>>>> to log something like, "nopat is set, but we cannot disable
> >>>>> PAT, doing our best to disable PAT by not reporting PAT
> >>>>> as enabled via pat_enabled(), but that does not guarantee
> >>>>> that kernel drivers and components cannot use PAT if they
> >>>>> query for PAT support using boot_cpu_has(X86_FEATURE_PAT)
> >>>>> instead of pat_enabled()." However, I acknowledge WC mappings
> >>>>> would still be disabled because arch_can_pci_mmap_wc() will
> >>>>> be false if pat_enabled() is false.
> >>>>>
> >>>>> Perhaps we also need to log something if we keep the
> >>>>> check for "nopat" where I placed it. We could say something
> >>>>> like: "nopat is set, but we cannot disable hardware/firmware
> >>>>> PAT support, so we are emulating as if there is no PAT support
> >>>>> which puts in place a software view that does not match
> >>>>> hardware."
> >>>>>
> >>>>> No matter what, because we cannot write to PAT MSR in
> >>>>> the Xen PV case, we probably need to log something to
> >>>>> explain the problems associated with trying to honor the
> >>>>> administrator's request. Also, what log level should it be.
> >>>>> Should it be a pr_warn instead of a pr_info?
> >>>>
> >>>> I'm afraid I'm the wrong one to answer logging questions. As you
> >>>> can see from my original patch, I didn't add any new logging (and
> >>>> no addition was requested in the comments that I have got). I also
> >>>> don't think "nopat" has ever meant "disable PAT", as the feature
> >>>> is either there or not. Instead I think it was always seen as
> >>>> "disable fiddling with PAT", which by implication means using
> >>>> whatever is there (if the feature / MSR itself is available).
> >>>
> >>> IIRC, I do think I mentioned in the comments on your patch that
> >>> it would be preferable to mention in the commit message that
> >>> your patch would change the current behavior of "nopat" on
> >>> Xen. The question is, how much do we want to change the
> >>> current behavior of "nopat" on Xen. I think if we have to change
> >>> the current behavior of "nopat" on Xen and if we are going
> >>> to propagate that change to all current stable branches all
> >>> the way back to 4.9.y,, we better make a lot of noise about
> >>> what we are doing here.
> >>>
> >>> Chuck
> >>
> >> And in addition, if we are going to backport this patch to
> >> all current stable branches, we better have a really, really,
> >> good reason for changing the behavior of "nopat" on Xen.
> >>
> >> Does such a reason exist?
> > 
> > Well, the simple reason is: It doesn't work the same way under Xen
> > and non-Xen (in turn because, before my patch or whatever equivalent
> > work, things don't work properly anyway, PAT-wise). Yet it definitely
> > ought to behave the same everywhere, imo.
>
> There is Documentation/x86/pat.rst which rather clearly states, how
> "nopat" is meant to work. It should not change the contents of the
> PAT MSR and keep it just as it was set at boot time (the doc talks
> about the "BIOS" setting of the MSR, and I guess in the Xen case
> the hypervisor is kind of acting as the BIOS).

If that is the true meaning of "nopat", then the pat_enabled() test we
currently have in the i915 driver is the wrong test for the capability
of the
CPU to use the fast WC type pages for video frames access because it is
possible for pat_enabled() to be false and "nopat" set with its official
meaning, and still have a CPU with WC cache mode capability.

If we accept pat_enabled() as implied WC cache mode support, why not also
accept (!pat_enabled && boot_cpu_has(X86_FEATURE_HYPERVISOR)) also
as implied WC cache mode support? That is what Jan's patch effectively does.
He just possibly places his patch in the wrong portion of the Linux tree
to be consistent with the official meaning of "nopat" and pat_enabled().

We could implement Jan's fix instead in the i915 driver instead if we need
to be consistent with the official meaning of "nopat" and pat_enabled().

I could make that a v3 of my patch - and try the i915 maintainers instead of
the x86 maintainers to provide the fix. But before I do that, can someone
on this list of 20 recipients tell me why none of you have fixed this nasty
regression? I am new to trying to contribute to Linux and the whole
experience is frustrating when all you get is stonewalling from the official
maintainers. So why not just someone step up and do this fix?

In the meantime, Juergen can start working on cleaning up the x86/PAT
code so it can provide the i915 driver with a test not for PAT, but for the
WC page caching mode support that works in all supported environments,
including Xen. Currently there is no such test available. Juergen proposed
one but it failed to accurately test for WC cache mode capability on my
Xen workstation. Until the x86 subsystem developers can provide the rest
of Linux with an accurate test for the WC caching mode, we have to
settle for
less than a pure and perfect solution if we are serious about following
Linus' regression rule and accept a quick fix to a nasty regression while
we wait for a better solution that will hopefully come later.

Chuck
Thorsten Leemhuis July 14, 2022, 5:30 a.m. UTC | #18
On 13.07.22 03:36, Chuck Zmudzinski wrote:
> The commit 99c13b8c8896d7bcb92753bf
> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> incorrectly failed to account for the case in init_cache_modes() when
> CPUs do support PAT and falsely reported PAT to be disabled when in
> fact PAT is enabled. In some environments, notably in Xen PV domains,
> MTRR is disabled but PAT is still enabled, and that is the case
> that the aforementioned commit failed to account for.
> 
> As an unfortunate consequnce, the pat_enabled() function currently does
> not correctly report that PAT is enabled in such environments. The fix
> is implemented in init_cache_modes() by setting pat_bp_enabled to true
> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> to account for.
> 
> This approach arranges for pat_enabled() to return true in the Xen PV
> environment without undermining the rest of PAT MSR management logic
> that considers PAT to be disabled: Specifically, no writes to the PAT
> MSR should occur.
> 
> This patch fixes a regression that some users are experiencing with
> Linux as a Xen Dom0 driving particular Intel graphics devices by
> correctly reporting to the Intel i915 driver that PAT is enabled where
> previously it was falsely reporting that PAT is disabled. Some users
> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> Dom0 are experiencing reduced graphics performance because the keying of
> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> means that in particular graphics frame buffer accesses are quite a bit
> less performant than possible without this patch.
> 
> Also, with the current code, in the Xen PV environment, PAT will not be
> disabled if the administrator sets the "nopat" boot option. Introduce
> a new boolean variable, pat_force_disable, to forcibly disable PAT
> when the administrator sets the "nopat" option to override the default
> behavior of using the PAT configuration that Xen has provided.
> 
> For the new boolean to live in .init.data, init_cache_modes() also needs
> moving to .init.text (where it could/should have lived already before).
> 
> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")

BTW, "submitting-patches.rst" says it should just be "the first 12
characters of the SHA-1 ID"

> Co-developed-by: Jan Beulich <jbeulich@suse.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>

Sorry, have to ask: is this supposed to finally fix this regression?
https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

If yes, please include Link: and Reported-by: tags, as explained in
submitting-patches.rst (I only care about the link tag, as I'm tacking
that regression).

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
Jürgen Groß July 14, 2022, 5:40 a.m. UTC | #19
On 13.07.22 03:36, Chuck Zmudzinski wrote:
> The commit 99c13b8c8896d7bcb92753bf
> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> incorrectly failed to account for the case in init_cache_modes() when
> CPUs do support PAT and falsely reported PAT to be disabled when in
> fact PAT is enabled. In some environments, notably in Xen PV domains,
> MTRR is disabled but PAT is still enabled, and that is the case
> that the aforementioned commit failed to account for.
> 
> As an unfortunate consequnce, the pat_enabled() function currently does
> not correctly report that PAT is enabled in such environments. The fix
> is implemented in init_cache_modes() by setting pat_bp_enabled to true
> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> to account for.
> 
> This approach arranges for pat_enabled() to return true in the Xen PV
> environment without undermining the rest of PAT MSR management logic
> that considers PAT to be disabled: Specifically, no writes to the PAT
> MSR should occur.
> 
> This patch fixes a regression that some users are experiencing with
> Linux as a Xen Dom0 driving particular Intel graphics devices by
> correctly reporting to the Intel i915 driver that PAT is enabled where
> previously it was falsely reporting that PAT is disabled. Some users
> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> Dom0 are experiencing reduced graphics performance because the keying of
> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> means that in particular graphics frame buffer accesses are quite a bit
> less performant than possible without this patch.
> 
> Also, with the current code, in the Xen PV environment, PAT will not be
> disabled if the administrator sets the "nopat" boot option. Introduce
> a new boolean variable, pat_force_disable, to forcibly disable PAT
> when the administrator sets the "nopat" option to override the default
> behavior of using the PAT configuration that Xen has provided.
> 
> For the new boolean to live in .init.data, init_cache_modes() also needs
> moving to .init.text (where it could/should have lived already before).
> 
> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> Co-developed-by: Jan Beulich <jbeulich@suse.com>
> Cc: stable@vger.kernel.org
> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> ---
> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
>      *Add the necessary code to incorporate the "nopat" fix
>      *void init_cache_modes(void) -> void __init init_cache_modes(void)
>      *Add Jan Beulich as Co-developer (Jan has not signed off yet)
>      *Expand the commit message to include relevant parts of the commit
>       message of Jan Beulich's proposed patch for this problem
>      *Fix 'else if ... {' placement and indentation
>      *Remove indication the backport to stable branches is only back to 5.17.y
> 
> I think these changes address all the comments on the original patch
> 
> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> include Jan's idea for fixing "nopat" that was missing from the first
> version of the patch.
> 
> The patch has been tested, it works as expected with and without nopat
> in the Xen PV Dom0 environment. That is, "nopat" causes the system to
> exhibit the effects and problems that lack of PAT support causes.
> 
>   arch/x86/mm/pat/memtype.c | 16 ++++++++++++++--
>   1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> index d5ef64ddd35e..10a37d309d23 100644
> --- a/arch/x86/mm/pat/memtype.c
> +++ b/arch/x86/mm/pat/memtype.c
> @@ -62,6 +62,7 @@
>   
>   static bool __read_mostly pat_bp_initialized;
>   static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
>   static bool __read_mostly pat_bp_enabled;
>   static bool __read_mostly pat_cm_initialized;
>   
> @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
>   static int __init nopat(char *str)
>   {
>   	pat_disable("PAT support disabled via boot option.");
> +	pat_force_disabled = true;
>   	return 0;
>   }
>   early_param("nopat", nopat);
> @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
>   	wrmsrl(MSR_IA32_CR_PAT, pat);
>   }
>   
> -void init_cache_modes(void)
> +void __init init_cache_modes(void)
>   {
>   	u64 pat = 0;
>   
> @@ -292,7 +294,7 @@ void init_cache_modes(void)
>   		rdmsrl(MSR_IA32_CR_PAT, pat);
>   	}
>   
> -	if (!pat) {
> +	if (!pat || pat_force_disabled) {

Can we just remove this modification and ...

>   		/*
>   		 * No PAT. Emulate the PAT table that corresponds to the two
>   		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> @@ -313,6 +315,16 @@ void init_cache_modes(void)
>   		 */
>   		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>   		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> +	} else if (!pat_bp_enabled) {

... use

+	} else if (!pat_bp_enabled && !pat_force_disabled) {

here?

This will result in the desired outcome in all cases IMO: If PAT wasn't
disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or
Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of
MTRR), then PAT will be set to "enabled" again.

> +		/*
> +		 * In some environments, specifically Xen PV, PAT
> +		 * initialization is skipped because MTRRs are disabled even
> +		 * though PAT is available. In such environments, set PAT to
> +		 * enabled to correctly indicate to callers of pat_enabled()
> +		 * that CPU support for PAT is available.
> +		 */
> +		pat_bp_enabled = true;
> +		pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
>   	}
>   
>   	__init_cache_modes(pat);


Juergen
Jan Beulich July 14, 2022, 6:28 a.m. UTC | #20
On 14.07.2022 07:40, Juergen Gross wrote:
> On 13.07.22 03:36, Chuck Zmudzinski wrote:
>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
>>   		rdmsrl(MSR_IA32_CR_PAT, pat);
>>   	}
>>   
>> -	if (!pat) {
>> +	if (!pat || pat_force_disabled) {
> 
> Can we just remove this modification and ...
> 
>>   		/*
>>   		 * No PAT. Emulate the PAT table that corresponds to the two
>>   		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
>>   		 */
>>   		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>   		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>> +	} else if (!pat_bp_enabled) {
> 
> ... use
> 
> +	} else if (!pat_bp_enabled && !pat_force_disabled) {
> 
> here?
> 
> This will result in the desired outcome in all cases IMO: If PAT wasn't
> disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or
> Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of
> MTRR), then PAT will be set to "enabled" again.

Just to mention it explicitly: If the value _read_ from the MSR is zero,
we're hosed anyway, as then we can only express a single memory type (UC)
in all PTEs. The zero case we mean to deal with is when reading the MSR
wasn't valid to try.

Jan
Chuck Zmudzinski July 15, 2022, 2:07 a.m. UTC | #21
On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
> On 13.07.22 03:36, Chuck Zmudzinski wrote:
> > The commit 99c13b8c8896d7bcb92753bf
> > ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > incorrectly failed to account for the case in init_cache_modes() when
> > CPUs do support PAT and falsely reported PAT to be disabled when in
> > fact PAT is enabled. In some environments, notably in Xen PV domains,
> > MTRR is disabled but PAT is still enabled, and that is the case
> > that the aforementioned commit failed to account for.
> > 
> > As an unfortunate consequnce, the pat_enabled() function currently does
> > not correctly report that PAT is enabled in such environments. The fix
> > is implemented in init_cache_modes() by setting pat_bp_enabled to true
> > in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> > ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> > to account for.
> > 
> > This approach arranges for pat_enabled() to return true in the Xen PV
> > environment without undermining the rest of PAT MSR management logic
> > that considers PAT to be disabled: Specifically, no writes to the PAT
> > MSR should occur.
> > 
> > This patch fixes a regression that some users are experiencing with
> > Linux as a Xen Dom0 driving particular Intel graphics devices by
> > correctly reporting to the Intel i915 driver that PAT is enabled where
> > previously it was falsely reporting that PAT is disabled. Some users
> > are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> > Dom0 are experiencing reduced graphics performance because the keying of
> > the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> > means that in particular graphics frame buffer accesses are quite a bit
> > less performant than possible without this patch.
> > 
> > Also, with the current code, in the Xen PV environment, PAT will not be
> > disabled if the administrator sets the "nopat" boot option. Introduce
> > a new boolean variable, pat_force_disable, to forcibly disable PAT
> > when the administrator sets the "nopat" option to override the default
> > behavior of using the PAT configuration that Xen has provided.
> > 
> > For the new boolean to live in .init.data, init_cache_modes() also needs
> > moving to .init.text (where it could/should have lived already before).
> > 
> > Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>
> BTW, "submitting-patches.rst" says it should just be "the first 12
> characters of the SHA-1 ID"

Actually it says "at least", so more that 12 is It is probably safest
to put the entire SHA-1 ID in because of the possibility of
a collision. At least that's how I understand what
submitting-patches.rst.

>
> > Co-developed-by: Jan Beulich <jbeulich@suse.com>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>
> Sorry, have to ask: is this supposed to finally fix this regression?
> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/

Yes that's the first report I saw to lkml about this isssue. So if I submit
a v3 I will include that. But my patch does not have a sign-off from the
Co-developer as I mentioned in a message earlier today, and the
discussion that has ensued leads me to think a better solution is to just
revert the commit in the i915 driver instead, and leave the bigger questions
for Juergen Gross and his plans to re-work the x86/PAT code in September,
as he said on this thread in the last couple of days. This patch is dead
now,
as far as I can tell, because the Co-developer is not cooperating.

Chuck
>
> If yes, please include Link: and Reported-by: tags, as explained in
> submitting-patches.rst (I only care about the link tag, as I'm tacking
> that regression).
>
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>
> P.S.: As the Linux kernel's regression tracker I deal with a lot of
> reports and sometimes miss something important when writing mails like
> this. If that's the case here, don't hesitate to tell me in a public
> reply, it's in everyone's interest to set the public record straight.
Chuck Zmudzinski July 15, 2022, 2:19 a.m. UTC | #22
On 7/14/2022 1:40 AM, Juergen Gross wrote:
> On 13.07.22 03:36, Chuck Zmudzinski wrote:
> > The commit 99c13b8c8896d7bcb92753bf
> > ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > incorrectly failed to account for the case in init_cache_modes() when
> > CPUs do support PAT and falsely reported PAT to be disabled when in
> > fact PAT is enabled. In some environments, notably in Xen PV domains,
> > MTRR is disabled but PAT is still enabled, and that is the case
> > that the aforementioned commit failed to account for.
> > 
> > As an unfortunate consequnce, the pat_enabled() function currently does
> > not correctly report that PAT is enabled in such environments. The fix
> > is implemented in init_cache_modes() by setting pat_bp_enabled to true
> > in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> > ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> > to account for.
> > 
> > This approach arranges for pat_enabled() to return true in the Xen PV
> > environment without undermining the rest of PAT MSR management logic
> > that considers PAT to be disabled: Specifically, no writes to the PAT
> > MSR should occur.
> > 
> > This patch fixes a regression that some users are experiencing with
> > Linux as a Xen Dom0 driving particular Intel graphics devices by
> > correctly reporting to the Intel i915 driver that PAT is enabled where
> > previously it was falsely reporting that PAT is disabled. Some users
> > are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> > Dom0 are experiencing reduced graphics performance because the keying of
> > the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> > means that in particular graphics frame buffer accesses are quite a bit
> > less performant than possible without this patch.
> > 
> > Also, with the current code, in the Xen PV environment, PAT will not be
> > disabled if the administrator sets the "nopat" boot option. Introduce
> > a new boolean variable, pat_force_disable, to forcibly disable PAT
> > when the administrator sets the "nopat" option to override the default
> > behavior of using the PAT configuration that Xen has provided.
> > 
> > For the new boolean to live in .init.data, init_cache_modes() also needs
> > moving to .init.text (where it could/should have lived already before).
> > 
> > Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > Co-developed-by: Jan Beulich <jbeulich@suse.com>
> > Cc: stable@vger.kernel.org
> > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> > ---
> > v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> >      *Add the necessary code to incorporate the "nopat" fix
> >      *void init_cache_modes(void) -> void __init init_cache_modes(void)
> >      *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> >      *Expand the commit message to include relevant parts of the commit
> >       message of Jan Beulich's proposed patch for this problem
> >      *Fix 'else if ... {' placement and indentation
> >      *Remove indication the backport to stable branches is only back to 5.17.y
> > 
> > I think these changes address all the comments on the original patch
> > 
> > I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> > include Jan's idea for fixing "nopat" that was missing from the first
> > version of the patch.
> > 
> > The patch has been tested, it works as expected with and without nopat
> > in the Xen PV Dom0 environment. That is, "nopat" causes the system to
> > exhibit the effects and problems that lack of PAT support causes.
> > 
> >   arch/x86/mm/pat/memtype.c | 16 ++++++++++++++--
> >   1 file changed, 14 insertions(+), 2 deletions(-)
> > 
> > diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> > index d5ef64ddd35e..10a37d309d23 100644
> > --- a/arch/x86/mm/pat/memtype.c
> > +++ b/arch/x86/mm/pat/memtype.c
> > @@ -62,6 +62,7 @@
> >   
> >   static bool __read_mostly pat_bp_initialized;
> >   static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> > +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> >   static bool __read_mostly pat_bp_enabled;
> >   static bool __read_mostly pat_cm_initialized;
> >   
> > @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
> >   static int __init nopat(char *str)
> >   {
> >   	pat_disable("PAT support disabled via boot option.");
> > +	pat_force_disabled = true;
> >   	return 0;
> >   }
> >   early_param("nopat", nopat);
> > @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
> >   	wrmsrl(MSR_IA32_CR_PAT, pat);
> >   }
> >   
> > -void init_cache_modes(void)
> > +void __init init_cache_modes(void)
> >   {
> >   	u64 pat = 0;
> >   
> > @@ -292,7 +294,7 @@ void init_cache_modes(void)
> >   		rdmsrl(MSR_IA32_CR_PAT, pat);
> >   	}
> >   
> > -	if (!pat) {
> > +	if (!pat || pat_force_disabled) {
>
> Can we just remove this modification and ...
>
> >   		/*
> >   		 * No PAT. Emulate the PAT table that corresponds to the two
> >   		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> > @@ -313,6 +315,16 @@ void init_cache_modes(void)
> >   		 */
> >   		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >   		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> > +	} else if (!pat_bp_enabled) {
>
> ... use
>
> +	} else if (!pat_bp_enabled && !pat_force_disabled) {
>
> here?
>
> This will result in the desired outcome in all cases IMO: If PAT wasn't
> disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or
> Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of
> MTRR), then PAT will be set to "enabled" again.

With that, you can also completely remove the new Boolean - it
will be a meaningless variable wasting memory. This will also make
my patch more or less do what Jan's patch does - the "nopat" option
will not cause the situation when the PAT MSR does not match the
software view. So you are basically proposing just going back to
my original patch, after fixing the style problems, of course. That
also would solve the problem of needing Jan's S-o-b. I am inclined,
however, to wait for a maintainer who has power to actually do the
commit, to make a comment. Your R-b to my v2 did not have much clout
with the actual maintainers, as far as I can tell. I am somewhat annoyed
that it was at your suggestion that my v2 ended up confusing the
main issue, the regression, with the red herring of the "nopat"
option.

Chuck


> > +		/*
> > +		 * In some environments, specifically Xen PV, PAT
> > +		 * initialization is skipped because MTRRs are disabled even
> > +		 * though PAT is available. In such environments, set PAT to
> > +		 * enabled to correctly indicate to callers of pat_enabled()
> > +		 * that CPU support for PAT is available.
> > +		 */
> > +		pat_bp_enabled = true;
> > +		pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
> >   	}
> >   
> >   	__init_cache_modes(pat);
>
>
> Juergen
Chuck Zmudzinski July 15, 2022, 2:53 a.m. UTC | #23
On 7/14/2022 10:19 PM, Chuck Zmudzinski wrote:
> On 7/14/2022 1:40 AM, Juergen Gross wrote:
> > On 13.07.22 03:36, Chuck Zmudzinski wrote:
> > > The commit 99c13b8c8896d7bcb92753bf
> > > ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > > incorrectly failed to account for the case in init_cache_modes() when
> > > CPUs do support PAT and falsely reported PAT to be disabled when in
> > > fact PAT is enabled. In some environments, notably in Xen PV domains,
> > > MTRR is disabled but PAT is still enabled, and that is the case
> > > that the aforementioned commit failed to account for.
> > > 
> > > As an unfortunate consequnce, the pat_enabled() function currently does
> > > not correctly report that PAT is enabled in such environments. The fix
> > > is implemented in init_cache_modes() by setting pat_bp_enabled to true
> > > in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> > > ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> > > to account for.
> > > 
> > > This approach arranges for pat_enabled() to return true in the Xen PV
> > > environment without undermining the rest of PAT MSR management logic
> > > that considers PAT to be disabled: Specifically, no writes to the PAT
> > > MSR should occur.
> > > 
> > > This patch fixes a regression that some users are experiencing with
> > > Linux as a Xen Dom0 driving particular Intel graphics devices by
> > > correctly reporting to the Intel i915 driver that PAT is enabled where
> > > previously it was falsely reporting that PAT is disabled. Some users
> > > are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> > > Dom0 are experiencing reduced graphics performance because the keying of
> > > the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> > > means that in particular graphics frame buffer accesses are quite a bit
> > > less performant than possible without this patch.
> > > 
> > > Also, with the current code, in the Xen PV environment, PAT will not be
> > > disabled if the administrator sets the "nopat" boot option. Introduce
> > > a new boolean variable, pat_force_disable, to forcibly disable PAT
> > > when the administrator sets the "nopat" option to override the default
> > > behavior of using the PAT configuration that Xen has provided.
> > > 
> > > For the new boolean to live in .init.data, init_cache_modes() also needs
> > > moving to .init.text (where it could/should have lived already before).
> > > 
> > > Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > > Co-developed-by: Jan Beulich <jbeulich@suse.com>
> > > Cc: stable@vger.kernel.org
> > > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> > > ---
> > > v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> > >      *Add the necessary code to incorporate the "nopat" fix
> > >      *void init_cache_modes(void) -> void __init init_cache_modes(void)
> > >      *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> > >      *Expand the commit message to include relevant parts of the commit
> > >       message of Jan Beulich's proposed patch for this problem
> > >      *Fix 'else if ... {' placement and indentation
> > >      *Remove indication the backport to stable branches is only back to 5.17.y
> > > 
> > > I think these changes address all the comments on the original patch
> > > 
> > > I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> > > include Jan's idea for fixing "nopat" that was missing from the first
> > > version of the patch.
> > > 
> > > The patch has been tested, it works as expected with and without nopat
> > > in the Xen PV Dom0 environment. That is, "nopat" causes the system to
> > > exhibit the effects and problems that lack of PAT support causes.
> > > 
> > >   arch/x86/mm/pat/memtype.c | 16 ++++++++++++++--
> > >   1 file changed, 14 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> > > index d5ef64ddd35e..10a37d309d23 100644
> > > --- a/arch/x86/mm/pat/memtype.c
> > > +++ b/arch/x86/mm/pat/memtype.c
> > > @@ -62,6 +62,7 @@
> > >   
> > >   static bool __read_mostly pat_bp_initialized;
> > >   static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> > > +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> > >   static bool __read_mostly pat_bp_enabled;
> > >   static bool __read_mostly pat_cm_initialized;
> > >   
> > > @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
> > >   static int __init nopat(char *str)
> > >   {
> > >   	pat_disable("PAT support disabled via boot option.");
> > > +	pat_force_disabled = true;
> > >   	return 0;
> > >   }
> > >   early_param("nopat", nopat);
> > > @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
> > >   	wrmsrl(MSR_IA32_CR_PAT, pat);
> > >   }
> > >   
> > > -void init_cache_modes(void)
> > > +void __init init_cache_modes(void)
> > >   {
> > >   	u64 pat = 0;
> > >   
> > > @@ -292,7 +294,7 @@ void init_cache_modes(void)
> > >   		rdmsrl(MSR_IA32_CR_PAT, pat);
> > >   	}
> > >   
> > > -	if (!pat) {
> > > +	if (!pat || pat_force_disabled) {
> >
> > Can we just remove this modification and ...
> >
> > >   		/*
> > >   		 * No PAT. Emulate the PAT table that corresponds to the two
> > >   		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> > > @@ -313,6 +315,16 @@ void init_cache_modes(void)
> > >   		 */
> > >   		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> > >   		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> > > +	} else if (!pat_bp_enabled) {
> >
> > ... use
> >
> > +	} else if (!pat_bp_enabled && !pat_force_disabled) {
> >
> > here?
> >
> > This will result in the desired outcome in all cases IMO: If PAT wasn't
> > disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or
> > Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of
> > MTRR), then PAT will be set to "enabled" again.
>
> With that, you can also completely remove the new Boolean - it
> will be a meaningless variable wasting memory. This will also make
> my patch more or less do what Jan's patch does - the "nopat" option
> will not cause the situation when the PAT MSR does not match the
> software view. So you are basically proposing just going back to
> my original patch, after fixing the style problems, of course. That
> also would solve the problem of needing Jan's S-o-b. I am inclined,
> however, to wait for a maintainer who has power to actually do the
> commit, to make a comment. Your R-b to my v2 did not have much clout
> with the actual maintainers, as far as I can tell. I am somewhat annoyed
> that it was at your suggestion that my v2 ended up confusing the
> main issue, the regression, with the red herring of the "nopat"
> option.
>
> Chuck

Actually, what your change does depend on keeping
pat_force_disable, but after all the discussion and
further thinking about this, I would prefer that you
give a R-b to v3 as simply my original patch with the
style fixed. I think it is wrong to confuse the regression
with the "nopat" issue. If you and Jan want to do a patch
for the "nopat" issue, that is your decision. I am not interested
in that. I am interested in fixing the regression. Also, I am
not included to formally submit v3 until Dave, Andy, Boris, or
someone else with more clout here on Linux expresses
interest in giving this idea an R-b.

Chuck

>
>
> > > +		/*
> > > +		 * In some environments, specifically Xen PV, PAT
> > > +		 * initialization is skipped because MTRRs are disabled even
> > > +		 * though PAT is available. In such environments, set PAT to
> > > +		 * enabled to correctly indicate to callers of pat_enabled()
> > > +		 * that CPU support for PAT is available.
> > > +		 */
> > > +		pat_bp_enabled = true;
> > > +		pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
> > >   	}
> > >   
> > >   	__init_cache_modes(pat);
> >
> >
> > Juergen
>
>
Chuck Zmudzinski July 15, 2022, 2:58 a.m. UTC | #24
On 7/14/2022 10:53 PM, Chuck Zmudzinski wrote:
> On 7/14/2022 10:19 PM, Chuck Zmudzinski wrote:
> > On 7/14/2022 1:40 AM, Juergen Gross wrote:
> > > On 13.07.22 03:36, Chuck Zmudzinski wrote:
> > > > The commit 99c13b8c8896d7bcb92753bf
> > > > ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > > > incorrectly failed to account for the case in init_cache_modes() when
> > > > CPUs do support PAT and falsely reported PAT to be disabled when in
> > > > fact PAT is enabled. In some environments, notably in Xen PV domains,
> > > > MTRR is disabled but PAT is still enabled, and that is the case
> > > > that the aforementioned commit failed to account for.
> > > > 
> > > > As an unfortunate consequnce, the pat_enabled() function currently does
> > > > not correctly report that PAT is enabled in such environments. The fix
> > > > is implemented in init_cache_modes() by setting pat_bp_enabled to true
> > > > in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> > > > ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> > > > to account for.
> > > > 
> > > > This approach arranges for pat_enabled() to return true in the Xen PV
> > > > environment without undermining the rest of PAT MSR management logic
> > > > that considers PAT to be disabled: Specifically, no writes to the PAT
> > > > MSR should occur.
> > > > 
> > > > This patch fixes a regression that some users are experiencing with
> > > > Linux as a Xen Dom0 driving particular Intel graphics devices by
> > > > correctly reporting to the Intel i915 driver that PAT is enabled where
> > > > previously it was falsely reporting that PAT is disabled. Some users
> > > > are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> > > > Dom0 are experiencing reduced graphics performance because the keying of
> > > > the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> > > > means that in particular graphics frame buffer accesses are quite a bit
> > > > less performant than possible without this patch.
> > > > 
> > > > Also, with the current code, in the Xen PV environment, PAT will not be
> > > > disabled if the administrator sets the "nopat" boot option. Introduce
> > > > a new boolean variable, pat_force_disable, to forcibly disable PAT
> > > > when the administrator sets the "nopat" option to override the default
> > > > behavior of using the PAT configuration that Xen has provided.
> > > > 
> > > > For the new boolean to live in .init.data, init_cache_modes() also needs
> > > > moving to .init.text (where it could/should have lived already before).
> > > > 
> > > > Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > > > Co-developed-by: Jan Beulich <jbeulich@suse.com>
> > > > Cc: stable@vger.kernel.org
> > > > Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> > > > ---
> > > > v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> > > >      *Add the necessary code to incorporate the "nopat" fix
> > > >      *void init_cache_modes(void) -> void __init init_cache_modes(void)
> > > >      *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> > > >      *Expand the commit message to include relevant parts of the commit
> > > >       message of Jan Beulich's proposed patch for this problem
> > > >      *Fix 'else if ... {' placement and indentation
> > > >      *Remove indication the backport to stable branches is only back to 5.17.y
> > > > 
> > > > I think these changes address all the comments on the original patch
> > > > 
> > > > I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> > > > include Jan's idea for fixing "nopat" that was missing from the first
> > > > version of the patch.
> > > > 
> > > > The patch has been tested, it works as expected with and without nopat
> > > > in the Xen PV Dom0 environment. That is, "nopat" causes the system to
> > > > exhibit the effects and problems that lack of PAT support causes.
> > > > 
> > > >   arch/x86/mm/pat/memtype.c | 16 ++++++++++++++--
> > > >   1 file changed, 14 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> > > > index d5ef64ddd35e..10a37d309d23 100644
> > > > --- a/arch/x86/mm/pat/memtype.c
> > > > +++ b/arch/x86/mm/pat/memtype.c
> > > > @@ -62,6 +62,7 @@
> > > >   
> > > >   static bool __read_mostly pat_bp_initialized;
> > > >   static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> > > > +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> > > >   static bool __read_mostly pat_bp_enabled;
> > > >   static bool __read_mostly pat_cm_initialized;
> > > >   
> > > > @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
> > > >   static int __init nopat(char *str)
> > > >   {
> > > >   	pat_disable("PAT support disabled via boot option.");
> > > > +	pat_force_disabled = true;
> > > >   	return 0;
> > > >   }
> > > >   early_param("nopat", nopat);
> > > > @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
> > > >   	wrmsrl(MSR_IA32_CR_PAT, pat);
> > > >   }
> > > >   
> > > > -void init_cache_modes(void)
> > > > +void __init init_cache_modes(void)
> > > >   {
> > > >   	u64 pat = 0;
> > > >   
> > > > @@ -292,7 +294,7 @@ void init_cache_modes(void)
> > > >   		rdmsrl(MSR_IA32_CR_PAT, pat);
> > > >   	}
> > > >   
> > > > -	if (!pat) {
> > > > +	if (!pat || pat_force_disabled) {
> > >
> > > Can we just remove this modification and ...
> > >
> > > >   		/*
> > > >   		 * No PAT. Emulate the PAT table that corresponds to the two
> > > >   		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> > > > @@ -313,6 +315,16 @@ void init_cache_modes(void)
> > > >   		 */
> > > >   		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> > > >   		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> > > > +	} else if (!pat_bp_enabled) {
> > >
> > > ... use
> > >
> > > +	} else if (!pat_bp_enabled && !pat_force_disabled) {
> > >
> > > here?
> > >
> > > This will result in the desired outcome in all cases IMO: If PAT wasn't
> > > disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or
> > > Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of
> > > MTRR), then PAT will be set to "enabled" again.
> >
> > With that, you can also completely remove the new Boolean - it
> > will be a meaningless variable wasting memory. This will also make
> > my patch more or less do what Jan's patch does - the "nopat" option
> > will not cause the situation when the PAT MSR does not match the
> > software view. So you are basically proposing just going back to
> > my original patch, after fixing the style problems, of course. That
> > also would solve the problem of needing Jan's S-o-b. I am inclined,
> > however, to wait for a maintainer who has power to actually do the
> > commit, to make a comment. Your R-b to my v2 did not have much clout
> > with the actual maintainers, as far as I can tell. I am somewhat annoyed
> > that it was at your suggestion that my v2 ended up confusing the
> > main issue, the regression, with the red herring of the "nopat"
> > option.
> >
> > Chuck
>
> Actually, what your change does depend on keeping
> pat_force_disable, but after all the discussion and
> further thinking about this, I would prefer that you
> give a R-b to v3 as simply my original patch with the
> style fixed. I think it is wrong to confuse the regression
> with the "nopat" issue. If you and Jan want to do a patch
> for the "nopat" issue, that is your decision. I am not interested
> in that. I am interested in fixing the regression. Also, I am
> not included to formally submit v3 until Dave, Andy, Boris, or
> someone else with more clout here on Linux expresses
> interest in giving this idea an R-b.

I meant to say "i am not inclined..." not I am not included..."

Sorry for the confusion,

Chuck

>
> >
> >
> > > > +		/*
> > > > +		 * In some environments, specifically Xen PV, PAT
> > > > +		 * initialization is skipped because MTRRs are disabled even
> > > > +		 * though PAT is available. In such environments, set PAT to
> > > > +		 * enabled to correctly indicate to callers of pat_enabled()
> > > > +		 * that CPU support for PAT is available.
> > > > +		 */
> > > > +		pat_bp_enabled = true;
> > > > +		pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
> > > >   	}
> > > >   
> > > >   	__init_cache_modes(pat);
> > >
> > >
> > > Juergen
> >
> >
>
Jürgen Groß July 15, 2022, 4:22 a.m. UTC | #25
On 15.07.22 04:19, Chuck Zmudzinski wrote:
> On 7/14/2022 1:40 AM, Juergen Gross wrote:
>> On 13.07.22 03:36, Chuck Zmudzinski wrote:
>>> The commit 99c13b8c8896d7bcb92753bf
>>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>>> incorrectly failed to account for the case in init_cache_modes() when
>>> CPUs do support PAT and falsely reported PAT to be disabled when in
>>> fact PAT is enabled. In some environments, notably in Xen PV domains,
>>> MTRR is disabled but PAT is still enabled, and that is the case
>>> that the aforementioned commit failed to account for.
>>>
>>> As an unfortunate consequnce, the pat_enabled() function currently does
>>> not correctly report that PAT is enabled in such environments. The fix
>>> is implemented in init_cache_modes() by setting pat_bp_enabled to true
>>> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
>>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
>>> to account for.
>>>
>>> This approach arranges for pat_enabled() to return true in the Xen PV
>>> environment without undermining the rest of PAT MSR management logic
>>> that considers PAT to be disabled: Specifically, no writes to the PAT
>>> MSR should occur.
>>>
>>> This patch fixes a regression that some users are experiencing with
>>> Linux as a Xen Dom0 driving particular Intel graphics devices by
>>> correctly reporting to the Intel i915 driver that PAT is enabled where
>>> previously it was falsely reporting that PAT is disabled. Some users
>>> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
>>> Dom0 are experiencing reduced graphics performance because the keying of
>>> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
>>> means that in particular graphics frame buffer accesses are quite a bit
>>> less performant than possible without this patch.
>>>
>>> Also, with the current code, in the Xen PV environment, PAT will not be
>>> disabled if the administrator sets the "nopat" boot option. Introduce
>>> a new boolean variable, pat_force_disable, to forcibly disable PAT
>>> when the administrator sets the "nopat" option to override the default
>>> behavior of using the PAT configuration that Xen has provided.
>>>
>>> For the new boolean to live in .init.data, init_cache_modes() also needs
>>> moving to .init.text (where it could/should have lived already before).
>>>
>>> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>>> Co-developed-by: Jan Beulich <jbeulich@suse.com>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>>> ---
>>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
>>>       *Add the necessary code to incorporate the "nopat" fix
>>>       *void init_cache_modes(void) -> void __init init_cache_modes(void)
>>>       *Add Jan Beulich as Co-developer (Jan has not signed off yet)
>>>       *Expand the commit message to include relevant parts of the commit
>>>        message of Jan Beulich's proposed patch for this problem
>>>       *Fix 'else if ... {' placement and indentation
>>>       *Remove indication the backport to stable branches is only back to 5.17.y
>>>
>>> I think these changes address all the comments on the original patch
>>>
>>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
>>> include Jan's idea for fixing "nopat" that was missing from the first
>>> version of the patch.
>>>
>>> The patch has been tested, it works as expected with and without nopat
>>> in the Xen PV Dom0 environment. That is, "nopat" causes the system to
>>> exhibit the effects and problems that lack of PAT support causes.
>>>
>>>    arch/x86/mm/pat/memtype.c | 16 ++++++++++++++--
>>>    1 file changed, 14 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
>>> index d5ef64ddd35e..10a37d309d23 100644
>>> --- a/arch/x86/mm/pat/memtype.c
>>> +++ b/arch/x86/mm/pat/memtype.c
>>> @@ -62,6 +62,7 @@
>>>    
>>>    static bool __read_mostly pat_bp_initialized;
>>>    static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
>>> +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
>>>    static bool __read_mostly pat_bp_enabled;
>>>    static bool __read_mostly pat_cm_initialized;
>>>    
>>> @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
>>>    static int __init nopat(char *str)
>>>    {
>>>    	pat_disable("PAT support disabled via boot option.");
>>> +	pat_force_disabled = true;
>>>    	return 0;
>>>    }
>>>    early_param("nopat", nopat);
>>> @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
>>>    	wrmsrl(MSR_IA32_CR_PAT, pat);
>>>    }
>>>    
>>> -void init_cache_modes(void)
>>> +void __init init_cache_modes(void)
>>>    {
>>>    	u64 pat = 0;
>>>    
>>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
>>>    		rdmsrl(MSR_IA32_CR_PAT, pat);
>>>    	}
>>>    
>>> -	if (!pat) {
>>> +	if (!pat || pat_force_disabled) {
>>
>> Can we just remove this modification and ...
>>
>>>    		/*
>>>    		 * No PAT. Emulate the PAT table that corresponds to the two
>>>    		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
>>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
>>>    		 */
>>>    		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
>>>    		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
>>> +	} else if (!pat_bp_enabled) {
>>
>> ... use
>>
>> +	} else if (!pat_bp_enabled && !pat_force_disabled) {
>>
>> here?
>>
>> This will result in the desired outcome in all cases IMO: If PAT wasn't
>> disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or
>> Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of
>> MTRR), then PAT will be set to "enabled" again.
> 
> With that, you can also completely remove the new Boolean - it
> will be a meaningless variable wasting memory. This will also make

No, it is making a difference with "nopat" having been specified.

In the Xen PV case we will have pat_bp_enabled == false due to the
lack of MTRR. We don't want to set it to true if "nopat" has been
specified on the command line, so pat_force_disabled should not be
true when we are setting pat_bp_enabled to true again.

> my patch more or less do what Jan's patch does - the "nopat" option
> will not cause the situation when the PAT MSR does not match the
> software view. So you are basically proposing just going back to
> my original patch, after fixing the style problems, of course. That
> also would solve the problem of needing Jan's S-o-b. I am inclined,
> however, to wait for a maintainer who has power to actually do the
> commit, to make a comment. Your R-b to my v2 did not have much clout
> with the actual maintainers, as far as I can tell. I am somewhat annoyed
> that it was at your suggestion that my v2 ended up confusing the
> main issue, the regression, with the red herring of the "nopat"
> option.

I'm sorry for that.


Juergen
Chuck Zmudzinski July 15, 2022, 4:42 a.m. UTC | #26
On 7/15/2022 12:22 AM, Juergen Gross wrote:
> On 15.07.22 04:19, Chuck Zmudzinski wrote:
> > On 7/14/2022 1:40 AM, Juergen Gross wrote:
> >> On 13.07.22 03:36, Chuck Zmudzinski wrote:
> >>> The commit 99c13b8c8896d7bcb92753bf
> >>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> >>> incorrectly failed to account for the case in init_cache_modes() when
> >>> CPUs do support PAT and falsely reported PAT to be disabled when in
> >>> fact PAT is enabled. In some environments, notably in Xen PV domains,
> >>> MTRR is disabled but PAT is still enabled, and that is the case
> >>> that the aforementioned commit failed to account for.
> >>>
> >>> As an unfortunate consequnce, the pat_enabled() function currently does
> >>> not correctly report that PAT is enabled in such environments. The fix
> >>> is implemented in init_cache_modes() by setting pat_bp_enabled to true
> >>> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> >>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> >>> to account for.
> >>>
> >>> This approach arranges for pat_enabled() to return true in the Xen PV
> >>> environment without undermining the rest of PAT MSR management logic
> >>> that considers PAT to be disabled: Specifically, no writes to the PAT
> >>> MSR should occur.
> >>>
> >>> This patch fixes a regression that some users are experiencing with
> >>> Linux as a Xen Dom0 driving particular Intel graphics devices by
> >>> correctly reporting to the Intel i915 driver that PAT is enabled where
> >>> previously it was falsely reporting that PAT is disabled. Some users
> >>> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> >>> Dom0 are experiencing reduced graphics performance because the keying of
> >>> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> >>> means that in particular graphics frame buffer accesses are quite a bit
> >>> less performant than possible without this patch.
> >>>
> >>> Also, with the current code, in the Xen PV environment, PAT will not be
> >>> disabled if the administrator sets the "nopat" boot option. Introduce
> >>> a new boolean variable, pat_force_disable, to forcibly disable PAT
> >>> when the administrator sets the "nopat" option to override the default
> >>> behavior of using the PAT configuration that Xen has provided.
> >>>
> >>> For the new boolean to live in .init.data, init_cache_modes() also needs
> >>> moving to .init.text (where it could/should have lived already before).
> >>>
> >>> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> >>> Co-developed-by: Jan Beulich <jbeulich@suse.com>
> >>> Cc: stable@vger.kernel.org
> >>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> >>> ---
> >>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
> >>>       *Add the necessary code to incorporate the "nopat" fix
> >>>       *void init_cache_modes(void) -> void __init init_cache_modes(void)
> >>>       *Add Jan Beulich as Co-developer (Jan has not signed off yet)
> >>>       *Expand the commit message to include relevant parts of the commit
> >>>        message of Jan Beulich's proposed patch for this problem
> >>>       *Fix 'else if ... {' placement and indentation
> >>>       *Remove indication the backport to stable branches is only back to 5.17.y
> >>>
> >>> I think these changes address all the comments on the original patch
> >>>
> >>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to
> >>> include Jan's idea for fixing "nopat" that was missing from the first
> >>> version of the patch.
> >>>
> >>> The patch has been tested, it works as expected with and without nopat
> >>> in the Xen PV Dom0 environment. That is, "nopat" causes the system to
> >>> exhibit the effects and problems that lack of PAT support causes.
> >>>
> >>>    arch/x86/mm/pat/memtype.c | 16 ++++++++++++++--
> >>>    1 file changed, 14 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
> >>> index d5ef64ddd35e..10a37d309d23 100644
> >>> --- a/arch/x86/mm/pat/memtype.c
> >>> +++ b/arch/x86/mm/pat/memtype.c
> >>> @@ -62,6 +62,7 @@
> >>>    
> >>>    static bool __read_mostly pat_bp_initialized;
> >>>    static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> >>> +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
> >>>    static bool __read_mostly pat_bp_enabled;
> >>>    static bool __read_mostly pat_cm_initialized;
> >>>    
> >>> @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
> >>>    static int __init nopat(char *str)
> >>>    {
> >>>    	pat_disable("PAT support disabled via boot option.");
> >>> +	pat_force_disabled = true;
> >>>    	return 0;
> >>>    }
> >>>    early_param("nopat", nopat);
> >>> @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
> >>>    	wrmsrl(MSR_IA32_CR_PAT, pat);
> >>>    }
> >>>    
> >>> -void init_cache_modes(void)
> >>> +void __init init_cache_modes(void)
> >>>    {
> >>>    	u64 pat = 0;
> >>>    
> >>> @@ -292,7 +294,7 @@ void init_cache_modes(void)
> >>>    		rdmsrl(MSR_IA32_CR_PAT, pat);
> >>>    	}
> >>>    
> >>> -	if (!pat) {
> >>> +	if (!pat || pat_force_disabled) {
> >>
> >> Can we just remove this modification and ...
> >>
> >>>    		/*
> >>>    		 * No PAT. Emulate the PAT table that corresponds to the two
> >>>    		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
> >>> @@ -313,6 +315,16 @@ void init_cache_modes(void)
> >>>    		 */
> >>>    		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
> >>>    		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
> >>> +	} else if (!pat_bp_enabled) {
> >>
> >> ... use
> >>
> >> +	} else if (!pat_bp_enabled && !pat_force_disabled) {
> >>
> >> here?
> >>
> >> This will result in the desired outcome in all cases IMO: If PAT wasn't
> >> disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or
> >> Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of
> >> MTRR), then PAT will be set to "enabled" again.
> > 
> > With that, you can also completely remove the new Boolean - it
> > will be a meaningless variable wasting memory. This will also make
>
> No, it is making a difference with "nopat" having been specified.
>
> In the Xen PV case we will have pat_bp_enabled == false due to the
> lack of MTRR. We don't want to set it to true if "nopat" has been
> specified on the command line, so pat_force_disabled should not be
> true when we are setting pat_bp_enabled to true again.
>
> > my patch more or less do what Jan's patch does - the "nopat" option
> > will not cause the situation when the PAT MSR does not match the
> > software view. So you are basically proposing just going back to
> > my original patch, after fixing the style problems, of course. That
> > also would solve the problem of needing Jan's S-o-b. I am inclined,
> > however, to wait for a maintainer who has power to actually do the
> > commit, to make a comment. Your R-b to my v2 did not have much clout
> > with the actual maintainers, as far as I can tell. I am somewhat annoyed
> > that it was at your suggestion that my v2 ended up confusing the
> > main issue, the regression, with the red herring of the "nopat"
> > option.
>
> I'm sorry for that.

I accept your apology. A few days back you indicated Boris was willing to
go along with the approach I was suggesting. Do you have any idea why
he didn't give me an R-b for either my original patch or v2 or at least tell
me what I would have to do to get his R-b?

Chuck

>
>
> Juergen
Thorsten Leemhuis July 15, 2022, 5 a.m. UTC | #27
On 15.07.22 04:07, Chuck Zmudzinski wrote:
> On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
>> On 13.07.22 03:36, Chuck Zmudzinski wrote:
>>> The commit 99c13b8c8896d7bcb92753bf
>>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>>> incorrectly failed to account for the case in init_cache_modes() when
>>> CPUs do support PAT and falsely reported PAT to be disabled when in
>>> fact PAT is enabled. In some environments, notably in Xen PV domains,
>>> MTRR is disabled but PAT is still enabled, and that is the case
>>> that the aforementioned commit failed to account for.
>>>
>>> As an unfortunate consequnce, the pat_enabled() function currently does
>>> not correctly report that PAT is enabled in such environments. The fix
>>> is implemented in init_cache_modes() by setting pat_bp_enabled to true
>>> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
>>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
>>> to account for.
>>>
>>> This approach arranges for pat_enabled() to return true in the Xen PV
>>> environment without undermining the rest of PAT MSR management logic
>>> that considers PAT to be disabled: Specifically, no writes to the PAT
>>> MSR should occur.
>>>
>>> This patch fixes a regression that some users are experiencing with
>>> Linux as a Xen Dom0 driving particular Intel graphics devices by
>>> correctly reporting to the Intel i915 driver that PAT is enabled where
>>> previously it was falsely reporting that PAT is disabled. Some users
>>> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
>>> Dom0 are experiencing reduced graphics performance because the keying of
>>> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
>>> means that in particular graphics frame buffer accesses are quite a bit
>>> less performant than possible without this patch.
>>>
>>> Also, with the current code, in the Xen PV environment, PAT will not be
>>> disabled if the administrator sets the "nopat" boot option. Introduce
>>> a new boolean variable, pat_force_disable, to forcibly disable PAT
>>> when the administrator sets the "nopat" option to override the default
>>> behavior of using the PAT configuration that Xen has provided.
>>>
>>> For the new boolean to live in .init.data, init_cache_modes() also needs
>>> moving to .init.text (where it could/should have lived already before).
>>>
>>> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>>
>> BTW, "submitting-patches.rst" says it should just be "the first 12
>> characters of the SHA-1 ID"
> 
> Actually it says "at least", 

/me looks a bit confused, as he quoted above directly from that file

Ohh, that "at least" is in a different section of the document. :-D For
the fixes tag is explicitly "12 characters":

```
If your patch fixes a bug in a specific commit, e.g. you found an issue
using git bisect, please use the ‘Fixes:’ tag with the first 12
characters of the SHA-1 ID, and the one line summary.
```

> so more that 12 is It is probably safest
> to put the entire SHA-1 ID in because of the possibility of
> a collision.

Yes, sure, but it's a little mess if everybody uses something different
and 12 characters is what was agreed on as "good enough" for now.

> At least that's how I understand what
> submitting-patches.rst.
>
>>> Co-developed-by: Jan Beulich <jbeulich@suse.com>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>>
>> Sorry, have to ask: is this supposed to finally fix this regression?
>> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> 
> Yes that's the first report I saw to lkml about this isssue.

Thx for confirming.

> So if I submit a v3 I will include that.

Thx.

> But my patch does not have a sign-off from the
> Co-developer as I mentioned in a message earlier today, and the
> discussion that has ensued leads me to think a better solution is to just
> revert the commit in the i915 driver instead, and leave the bigger questions
> for Juergen Gross and his plans to re-work the x86/PAT code in September,
> as he said on this thread in the last couple of days. This patch is dead
> now,
> as far as I can tell, because the Co-developer is not cooperating.

k, thx for the quick summary, this whole thing is a bit hard to follow
for me: I can only briefly look into most regressions, as there is only
so much time in a day...

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.
Jan Beulich July 15, 2022, 10:05 a.m. UTC | #28
On 15.07.2022 04:07, Chuck Zmudzinski wrote:
> On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
>> On 13.07.22 03:36, Chuck Zmudzinski wrote:
>>> The commit 99c13b8c8896d7bcb92753bf
>>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>>> incorrectly failed to account for the case in init_cache_modes() when
>>> CPUs do support PAT and falsely reported PAT to be disabled when in
>>> fact PAT is enabled. In some environments, notably in Xen PV domains,
>>> MTRR is disabled but PAT is still enabled, and that is the case
>>> that the aforementioned commit failed to account for.
>>>
>>> As an unfortunate consequnce, the pat_enabled() function currently does
>>> not correctly report that PAT is enabled in such environments. The fix
>>> is implemented in init_cache_modes() by setting pat_bp_enabled to true
>>> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
>>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
>>> to account for.
>>>
>>> This approach arranges for pat_enabled() to return true in the Xen PV
>>> environment without undermining the rest of PAT MSR management logic
>>> that considers PAT to be disabled: Specifically, no writes to the PAT
>>> MSR should occur.
>>>
>>> This patch fixes a regression that some users are experiencing with
>>> Linux as a Xen Dom0 driving particular Intel graphics devices by
>>> correctly reporting to the Intel i915 driver that PAT is enabled where
>>> previously it was falsely reporting that PAT is disabled. Some users
>>> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
>>> Dom0 are experiencing reduced graphics performance because the keying of
>>> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
>>> means that in particular graphics frame buffer accesses are quite a bit
>>> less performant than possible without this patch.
>>>
>>> Also, with the current code, in the Xen PV environment, PAT will not be
>>> disabled if the administrator sets the "nopat" boot option. Introduce
>>> a new boolean variable, pat_force_disable, to forcibly disable PAT
>>> when the administrator sets the "nopat" option to override the default
>>> behavior of using the PAT configuration that Xen has provided.
>>>
>>> For the new boolean to live in .init.data, init_cache_modes() also needs
>>> moving to .init.text (where it could/should have lived already before).
>>>
>>> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
>>
>> BTW, "submitting-patches.rst" says it should just be "the first 12
>> characters of the SHA-1 ID"
> 
> Actually it says "at least", so more that 12 is It is probably safest
> to put the entire SHA-1 ID in because of the possibility of
> a collision. At least that's how I understand what
> submitting-patches.rst.
> 
>>
>>> Co-developed-by: Jan Beulich <jbeulich@suse.com>
>>> Cc: stable@vger.kernel.org
>>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
>>
>> Sorry, have to ask: is this supposed to finally fix this regression?
>> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> 
> Yes that's the first report I saw to lkml about this isssue. So if I submit
> a v3 I will include that. But my patch does not have a sign-off from the
> Co-developer as I mentioned in a message earlier today, and the
> discussion that has ensued leads me to think a better solution is to just
> revert the commit in the i915 driver instead, and leave the bigger questions
> for Juergen Gross and his plans to re-work the x86/PAT code in September,
> as he said on this thread in the last couple of days. This patch is dead
> now,
> as far as I can tell, because the Co-developer is not cooperating.

???

Jan
Chuck Zmudzinski July 15, 2022, 7:53 p.m. UTC | #29
On 7/15/2022 6:05 AM, Jan Beulich wrote:
> On 15.07.2022 04:07, Chuck Zmudzinski wrote:
> > On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
> >> On 13.07.22 03:36, Chuck Zmudzinski wrote:
> >>> The commit 99c13b8c8896d7bcb92753bf
> >>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> >>> incorrectly failed to account for the case in init_cache_modes() when
> >>> CPUs do support PAT and falsely reported PAT to be disabled when in
> >>> fact PAT is enabled. In some environments, notably in Xen PV domains,
> >>> MTRR is disabled but PAT is still enabled, and that is the case
> >>> that the aforementioned commit failed to account for.
> >>>
> >>> As an unfortunate consequnce, the pat_enabled() function currently does
> >>> not correctly report that PAT is enabled in such environments. The fix
> >>> is implemented in init_cache_modes() by setting pat_bp_enabled to true
> >>> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> >>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> >>> to account for.
> >>>
> >>> This approach arranges for pat_enabled() to return true in the Xen PV
> >>> environment without undermining the rest of PAT MSR management logic
> >>> that considers PAT to be disabled: Specifically, no writes to the PAT
> >>> MSR should occur.
> >>>
> >>> This patch fixes a regression that some users are experiencing with
> >>> Linux as a Xen Dom0 driving particular Intel graphics devices by
> >>> correctly reporting to the Intel i915 driver that PAT is enabled where
> >>> previously it was falsely reporting that PAT is disabled. Some users
> >>> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> >>> Dom0 are experiencing reduced graphics performance because the keying of
> >>> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> >>> means that in particular graphics frame buffer accesses are quite a bit
> >>> less performant than possible without this patch.
> >>>
> >>> Also, with the current code, in the Xen PV environment, PAT will not be
> >>> disabled if the administrator sets the "nopat" boot option. Introduce
> >>> a new boolean variable, pat_force_disable, to forcibly disable PAT
> >>> when the administrator sets the "nopat" option to override the default
> >>> behavior of using the PAT configuration that Xen has provided.
> >>>
> >>> For the new boolean to live in .init.data, init_cache_modes() also needs
> >>> moving to .init.text (where it could/should have lived already before).
> >>>
> >>> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> >>
> >> BTW, "submitting-patches.rst" says it should just be "the first 12
> >> characters of the SHA-1 ID"
> > 
> > Actually it says "at least", so more that 12 is It is probably safest
> > to put the entire SHA-1 ID in because of the possibility of
> > a collision. At least that's how I understand what
> > submitting-patches.rst.
> > 
> >>
> >>> Co-developed-by: Jan Beulich <jbeulich@suse.com>
> >>> Cc: stable@vger.kernel.org
> >>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> >>
> >> Sorry, have to ask: is this supposed to finally fix this regression?
> >> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> > 
> > Yes that's the first report I saw to lkml about this isssue. So if I submit
> > a v3 I will include that. But my patch does not have a sign-off from the
> > Co-developer as I mentioned in a message earlier today, and the
> > discussion that has ensued leads me to think a better solution is to just
> > revert the commit in the i915 driver instead, and leave the bigger questions
> > for Juergen Gross and his plans to re-work the x86/PAT code in September,
> > as he said on this thread in the last couple of days. This patch is dead
> > now,
> > as far as I can tell, because the Co-developer is not cooperating.
>
> ???
>
> Jan

I think I recall you said my patch proves I don't want your S-o-b. I
also want
to add some useful logs with your approach, probably a pr_warn, which you
are not interested in doing. I think it is probably necessary, under the
current
situation, to warn all users of Xen/Linux that Linux on Xen is not
guaranteed
to be secure and full-featured anymore. That is also why I think the Linux
maintainers are ignoring your patch. They are basically saying Xen is just
some weird one-off thing, I can't remember who said it, but I did read
it here
in some of the comments on your patch, and you do not seem to be listening
to and responding to what the Linux developers are asking you to do.

Two things I see here in my efforts to get a patch to fix this regression:

1. Does Xen have plans to give Linux running in Dom0 write-access to the
PAT MSR?

2. Does Xen have plans to expose MTRRs to Linux running in Dom0?

These don't have to be the defaults for Dom0. But can Xen at least make
these as supported configurations for Linux? Then, the problem of Xen
being some weird one-off thing goes away. At least that's how I see
the situation. Maybe Xen can provide these for Linux in Dom0, but
from what I've read here, it seems Xen is not willing to do that for
Linux. Do I understand that correctly?

Chuck
Chuck Zmudzinski July 16, 2022, 11:02 a.m. UTC | #30
On 7/15/2022 3:53 PM, Chuck Zmudzinski wrote:
> On 7/15/2022 6:05 AM, Jan Beulich wrote:
> > On 15.07.2022 04:07, Chuck Zmudzinski wrote:
> > > On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
> > >> On 13.07.22 03:36, Chuck Zmudzinski wrote:
> > >>> The commit 99c13b8c8896d7bcb92753bf
> > >>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > >>> incorrectly failed to account for the case in init_cache_modes() when
> > >>> CPUs do support PAT and falsely reported PAT to be disabled when in
> > >>> fact PAT is enabled. In some environments, notably in Xen PV domains,
> > >>> MTRR is disabled but PAT is still enabled, and that is the case
> > >>> that the aforementioned commit failed to account for.
> > >>>
> > >>> As an unfortunate consequnce, the pat_enabled() function currently does
> > >>> not correctly report that PAT is enabled in such environments. The fix
> > >>> is implemented in init_cache_modes() by setting pat_bp_enabled to true
> > >>> in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
> > >>> ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
> > >>> to account for.
> > >>>
> > >>> This approach arranges for pat_enabled() to return true in the Xen PV
> > >>> environment without undermining the rest of PAT MSR management logic
> > >>> that considers PAT to be disabled: Specifically, no writes to the PAT
> > >>> MSR should occur.
> > >>>
> > >>> This patch fixes a regression that some users are experiencing with
> > >>> Linux as a Xen Dom0 driving particular Intel graphics devices by
> > >>> correctly reporting to the Intel i915 driver that PAT is enabled where
> > >>> previously it was falsely reporting that PAT is disabled. Some users
> > >>> are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
> > >>> Dom0 are experiencing reduced graphics performance because the keying of
> > >>> the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
> > >>> means that in particular graphics frame buffer accesses are quite a bit
> > >>> less performant than possible without this patch.
> > >>>
> > >>> Also, with the current code, in the Xen PV environment, PAT will not be
> > >>> disabled if the administrator sets the "nopat" boot option. Introduce
> > >>> a new boolean variable, pat_force_disable, to forcibly disable PAT
> > >>> when the administrator sets the "nopat" option to override the default
> > >>> behavior of using the PAT configuration that Xen has provided.
> > >>>
> > >>> For the new boolean to live in .init.data, init_cache_modes() also needs
> > >>> moving to .init.text (where it could/should have lived already before).
> > >>>
> > >>> Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
> > >>
> > >> BTW, "submitting-patches.rst" says it should just be "the first 12
> > >> characters of the SHA-1 ID"
> > > 
> > > Actually it says "at least", so more that 12 is It is probably safest
> > > to put the entire SHA-1 ID in because of the possibility of
> > > a collision. At least that's how I understand what
> > > submitting-patches.rst.
> > > 
> > >>
> > >>> Co-developed-by: Jan Beulich <jbeulich@suse.com>
> > >>> Cc: stable@vger.kernel.org
> > >>> Signed-off-by: Chuck Zmudzinski <brchuckz@aol.com>
> > >>
> > >> Sorry, have to ask: is this supposed to finally fix this regression?
> > >> https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
> > > 
> > > Yes that's the first report I saw to lkml about this isssue. So if I submit
> > > a v3 I will include that. But my patch does not have a sign-off from the
> > > Co-developer as I mentioned in a message earlier today, and the
> > > discussion that has ensued leads me to think a better solution is to just
> > > revert the commit in the i915 driver instead, and leave the bigger questions
> > > for Juergen Gross and his plans to re-work the x86/PAT code in September,
> > > as he said on this thread in the last couple of days. This patch is dead
> > > now,
> > > as far as I can tell, because the Co-developer is not cooperating.
> >
> > ???
> >
> > Jan
>
> I think I recall you said my patch proves I don't want your S-o-b. I
> also want
> to add some useful logs with your approach, probably a pr_warn, which you
> are not interested in doing. I think it is probably necessary, under the
> current
> situation, to warn all users of Xen/Linux that Linux on Xen is not
> guaranteed
> to be secure and full-featured anymore. That is also why I think the Linux
> maintainers are ignoring your patch. They are basically saying Xen is just
> some weird one-off thing, I can't remember who said it, but I did read
> it here
> in some of the comments on your patch, and you do not seem to be listening
> to and responding to what the Linux developers are asking you to do.
>
> Two things I see here in my efforts to get a patch to fix this regression:
>
> 1. Does Xen have plans to give Linux running in Dom0 write-access to the
> PAT MSR?
>
> 2. Does Xen have plans to expose MTRRs to Linux running in Dom0?
>
> These don't have to be the defaults for Dom0. But can Xen at least make
> these as supported configurations for Linux? Then, the problem of Xen
> being some weird one-off thing goes away. At least that's how I see
> the situation. Maybe Xen can provide these for Linux in Dom0, but
> from what I've read here, it seems Xen is not willing to do that for
> Linux. Do I understand that correctly?
>
> Chuck

???

Chuck
Jan Beulich July 18, 2022, 6:07 a.m. UTC | #31
On 15.07.2022 21:53, Chuck Zmudzinski wrote:
> Two things I see here in my efforts to get a patch to fix this regression:
> 
> 1. Does Xen have plans to give Linux running in Dom0 write-access to the
> PAT MSR?

No, as this is not technically feasible (all physical CPUs should run
with the same value in the MSR, or else other issues arise).

> 2. Does Xen have plans to expose MTRRs to Linux running in Dom0?

Yen does expose MTRRs to PV Dom0, but via a hypercall mechanism. I
don't think there are plans on the Xen side to support the MSR
interface (and hence to expose the CPUID bit), and iirc there are
no plans on the Linux side to use the MTRR interface. This also
wouldn't really make sense anymore now that it has become quite
clear that Linux wants to have PAT working without depending on
MTRR.

Jan
Chuck Zmudzinski July 18, 2022, 11:31 a.m. UTC | #32
On 7/18/2022 2:07 AM, Jan Beulich wrote:
> On 15.07.2022 21:53, Chuck Zmudzinski wrote:
> > Two things I see here in my efforts to get a patch to fix this regression:
> > 
> > 1. Does Xen have plans to give Linux running in Dom0 write-access to the
> > PAT MSR?
>
> No, as this is not technically feasible (all physical CPUs should run
> with the same value in the MSR, or else other issues arise).
>
> > 2. Does Xen have plans to expose MTRRs to Linux running in Dom0?
>
> Yen does expose MTRRs to PV Dom0, but via a hypercall mechanism. I
> don't think there are plans on the Xen side to support the MSR
> interface (and hence to expose the CPUID bit), and iirc there are
> no plans on the Linux side to use the MTRR interface. This also
> wouldn't really make sense anymore now that it has become quite
> clear that Linux wants to have PAT working without depending on
> MTRR.

I am not so sure about that, given what Borislav Petkov
said when commenting on your patch here:

https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/

Specifically, Borislav Petkov wrote on Tue, 5 Jul 2022 18:14:23 +0200:

Actually, the current goal is to adjust Xen dom0 because:

1. it uses the PAT code

2. but then it does something special and hides the MTRRs

which is not something real hardware does.

So this one-off thing should be prominent, visible and not get in the
way.

--------------end of Borislav Petkov quote-----------

Jan, can you explain this comment by Borislav Petkov about
Xen being a "one-off thing" that hides MTRRs and needs
to be "adjusted" so it does "not get in the way"?

Borislav, are you willing to retract the comments you made
on Tue, 5 Jul 2022 18:14:23 +0200 about Xen?

>
> Jan

Jan, thanks for answering my questions.

Chuck
Jan Beulich July 18, 2022, 11:39 a.m. UTC | #33
On 18.07.2022 13:31, Chuck Zmudzinski wrote:
> On 7/18/2022 2:07 AM, Jan Beulich wrote:
>> On 15.07.2022 21:53, Chuck Zmudzinski wrote:
>>> Two things I see here in my efforts to get a patch to fix this regression:
>>>
>>> 1. Does Xen have plans to give Linux running in Dom0 write-access to the
>>> PAT MSR?
>>
>> No, as this is not technically feasible (all physical CPUs should run
>> with the same value in the MSR, or else other issues arise).
>>
>>> 2. Does Xen have plans to expose MTRRs to Linux running in Dom0?
>>
>> Yen does expose MTRRs to PV Dom0, but via a hypercall mechanism. I
>> don't think there are plans on the Xen side to support the MSR
>> interface (and hence to expose the CPUID bit), and iirc there are
>> no plans on the Linux side to use the MTRR interface. This also
>> wouldn't really make sense anymore now that it has become quite
>> clear that Linux wants to have PAT working without depending on
>> MTRR.
> 
> I am not so sure about that, given what Borislav Petkov
> said when commenting on your patch here:
> 
> https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/
> 
> Specifically, Borislav Petkov wrote on Tue, 5 Jul 2022 18:14:23 +0200:
> 
> Actually, the current goal is to adjust Xen dom0 because:
> 
> 1. it uses the PAT code
> 
> 2. but then it does something special and hides the MTRRs
> 
> which is not something real hardware does.
> 
> So this one-off thing should be prominent, visible and not get in the
> way.
> 
> --------------end of Borislav Petkov quote-----------

And then, a day later, he said

"So I'm being told that it would be generally beneficial for all kinds of
 virtualization solutions to be able to support PAT only, without MTRRs
 so it would be interesting to see how ugly it would become to decouple
 PAT from MTRRs in Linux..."

> Jan, can you explain this comment by Borislav Petkov about
> Xen being a "one-off thing" that hides MTRRs and needs
> to be "adjusted" so it does "not get in the way"?

I'm afraid this isn't the first time that you ask people to explain
what somebody else said. I don't follow why you think I could better
explain what Boris said and why than he could do himself.

Jan
Chuck Zmudzinski July 18, 2022, 11:45 a.m. UTC | #34
On 7/18/2022 7:39 AM, Jan Beulich wrote:
> On 18.07.2022 13:31, Chuck Zmudzinski wrote:
> > On 7/18/2022 2:07 AM, Jan Beulich wrote:
> >> On 15.07.2022 21:53, Chuck Zmudzinski wrote:
> >>> Two things I see here in my efforts to get a patch to fix this regression:
> >>>
> >>> 1. Does Xen have plans to give Linux running in Dom0 write-access to the
> >>> PAT MSR?
> >>
> >> No, as this is not technically feasible (all physical CPUs should run
> >> with the same value in the MSR, or else other issues arise).
> >>
> >>> 2. Does Xen have plans to expose MTRRs to Linux running in Dom0?
> >>
> >> Yen does expose MTRRs to PV Dom0, but via a hypercall mechanism. I
> >> don't think there are plans on the Xen side to support the MSR
> >> interface (and hence to expose the CPUID bit), and iirc there are
> >> no plans on the Linux side to use the MTRR interface. This also
> >> wouldn't really make sense anymore now that it has become quite
> >> clear that Linux wants to have PAT working without depending on
> >> MTRR.
> > 
> > I am not so sure about that, given what Borislav Petkov
> > said when commenting on your patch here:
> > 
> > https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/
> > 
> > Specifically, Borislav Petkov wrote on Tue, 5 Jul 2022 18:14:23 +0200:
> > 
> > Actually, the current goal is to adjust Xen dom0 because:
> > 
> > 1. it uses the PAT code
> > 
> > 2. but then it does something special and hides the MTRRs
> > 
> > which is not something real hardware does.
> > 
> > So this one-off thing should be prominent, visible and not get in the
> > way.
> > 
> > --------------end of Borislav Petkov quote-----------
>
> And then, a day later, he said
>
> "So I'm being told that it would be generally beneficial for all kinds of
>  virtualization solutions to be able to support PAT only, without MTRRs
>  so it would be interesting to see how ugly it would become to decouple
>  PAT from MTRRs in Linux..."
>
> > Jan, can you explain this comment by Borislav Petkov about
> > Xen being a "one-off thing" that hides MTRRs and needs
> > to be "adjusted" so it does "not get in the way"?
>
> I'm afraid this isn't the first time that you ask people to explain
> what somebody else said. I don't follow why you think I could better
> explain what Boris said and why than he could do himself.

That is why I also asked Boris to say something now to clarify his
opinion on these matters. Let's wait and see if Boris says something
to clarify his opinion.

Chuck

>
> Jan
Chuck Zmudzinski July 18, 2022, 12:12 p.m. UTC | #35
On 7/18/2022 7:39 AM, Jan Beulich wrote:
> On 18.07.2022 13:31, Chuck Zmudzinski wrote:
> > On 7/18/2022 2:07 AM, Jan Beulich wrote:
> >> On 15.07.2022 21:53, Chuck Zmudzinski wrote:
> >>> Two things I see here in my efforts to get a patch to fix this regression:
> >>>
> >>> 1. Does Xen have plans to give Linux running in Dom0 write-access to the
> >>> PAT MSR?
> >>
> >> No, as this is not technically feasible (all physical CPUs should run
> >> with the same value in the MSR, or else other issues arise).
> >>
> >>> 2. Does Xen have plans to expose MTRRs to Linux running in Dom0?
> >>
> >> Yen does expose MTRRs to PV Dom0, but via a hypercall mechanism. I
> >> don't think there are plans on the Xen side to support the MSR
> >> interface (and hence to expose the CPUID bit), and iirc there are
> >> no plans on the Linux side to use the MTRR interface. This also
> >> wouldn't really make sense anymore now that it has become quite
> >> clear that Linux wants to have PAT working without depending on
> >> MTRR.
> > 
> > I am not so sure about that, given what Borislav Petkov
> > said when commenting on your patch here:
> > 
> > https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/
> > 
> > Specifically, Borislav Petkov wrote on Tue, 5 Jul 2022 18:14:23 +0200:
> > 
> > Actually, the current goal is to adjust Xen dom0 because:
> > 
> > 1. it uses the PAT code
> > 
> > 2. but then it does something special and hides the MTRRs
> > 
> > which is not something real hardware does.
> > 
> > So this one-off thing should be prominent, visible and not get in the
> > way.
> > 
> > --------------end of Borislav Petkov quote-----------
>
> And then, a day later, he said
>
> "So I'm being told that it would be generally beneficial for all kinds of
>  virtualization solutions to be able to support PAT only, without MTRRs
>  so it would be interesting to see how ugly it would become to decouple
>  PAT from MTRRs in Linux..."

What if it proves to be too ugly to decouple PAT from MTRRs? Then
I doubt that "Linux wants to have PAT working without depending
on MTRR." We can hope that Juergen's work to decouple PAT from
MTRRs is not so ugly that it cannot be done, but that is by no means
certain at this point. At the very least, this means the fix to the
regression
will be at least delayed considerably, and possibly this means this
regression will never be fixed in the mainline kernel release.

Chuck
Chuck Zmudzinski Aug. 17, 2022, 4:39 p.m. UTC | #36
On 7/14/2022 10:07 PM, Chuck Zmudzinski wrote:
> On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
>
> >
> > Sorry, have to ask: is this supposed to finally fix this regression?
> > https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
>
> Yes that's the first report I saw to lkml about this isssue.

Hi Thorsten,

Actually, now I realize that was not the first report to lkml about this
issue, although it *was* the first report to the regressions mailing list.

Actually, the first report to lkml about this issue was Jan's patch that
will hopefully soon make it to linux-next and mainline. So the proper
Link: tag for this issue in the actual patch to be committed to the mainline
kernel is to Jan's patch that was originally posted to lkml before any
user reported it to the regressions mailing list. To know there was a
regression from Jan's original patch, one would need to have read his
commit message since he did not actually report it as a regression to the
regressions mailing list at that time, nor did he identify a culprit commit
at that time.

Bottom line: everything seems OK right now because the patch moving
towards mainline does have the correct Link: tag. Thanks for all the
work you have done on this regression.

Best regards,

Chuck
diff mbox series

Patch

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index d5ef64ddd35e..10a37d309d23 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -62,6 +62,7 @@ 
 
 static bool __read_mostly pat_bp_initialized;
 static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
+static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
 static bool __read_mostly pat_bp_enabled;
 static bool __read_mostly pat_cm_initialized;
 
@@ -86,6 +87,7 @@  void pat_disable(const char *msg_reason)
 static int __init nopat(char *str)
 {
 	pat_disable("PAT support disabled via boot option.");
+	pat_force_disabled = true;
 	return 0;
 }
 early_param("nopat", nopat);
@@ -272,7 +274,7 @@  static void pat_ap_init(u64 pat)
 	wrmsrl(MSR_IA32_CR_PAT, pat);
 }
 
-void init_cache_modes(void)
+void __init init_cache_modes(void)
 {
 	u64 pat = 0;
 
@@ -292,7 +294,7 @@  void init_cache_modes(void)
 		rdmsrl(MSR_IA32_CR_PAT, pat);
 	}
 
-	if (!pat) {
+	if (!pat || pat_force_disabled) {
 		/*
 		 * No PAT. Emulate the PAT table that corresponds to the two
 		 * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@  void init_cache_modes(void)
 		 */
 		pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
 		      PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+	} else if (!pat_bp_enabled) {
+		/*
+		 * In some environments, specifically Xen PV, PAT
+		 * initialization is skipped because MTRRs are disabled even
+		 * though PAT is available. In such environments, set PAT to
+		 * enabled to correctly indicate to callers of pat_enabled()
+		 * that CPU support for PAT is available.
+		 */
+		pat_bp_enabled = true;
+		pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
 	}
 
 	__init_cache_modes(pat);