diff mbox series

[RESUBMIT] x86/mm: Fix PAT bit missing from page protection modify mask

Message ID 20230519183634.190364-1-janusz.krzysztofik@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series [RESUBMIT] x86/mm: Fix PAT bit missing from page protection modify mask | expand

Commit Message

Janusz Krzysztofik May 19, 2023, 6:36 p.m. UTC
Visible glitches have been observed when running graphics applications on
Linux under Xen hypervisor.  Those observations have been confirmed with
failures from kms_pwrite_crc Intel GPU test that verifies data coherency
of DRM frame buffer objects using hardware CRC checksums calculated by
display controllers, exposed to userspace via debugfs.  Affected
processing paths have then been identified with new IGT test variants that
mmap the objects using different methods and caching modes [1].

When running as a Xen PV guest, Linux uses Xen provided PAT configuration
which is different from its native one.  In particular, Xen specific PTE
encoding of write-combining caching, likely used by graphics applications,
differs from the Linux default one found among statically defined minimal
set of supported modes.  Since Xen defines PTE encoding of the WC mode as
_PAGE_PAT, it no longer belongs to the minimal set, depends on correct
handling of _PAGE_PAT bit, and can be mismatched with write-back caching.

When a user calls mmap() for a DRM buffer object, DRM device specific
.mmap file operation, called from mmap_region(), takes care of setting PTE
encoding bits in a vm_page_prot field of an associated virtual memory area
structure.  Unfortunately, _PAGE_PAT bit is not preserved when the vma's
.vm_flags are then applied to .vm_page_prot via vm_set_page_prot().  Bits
to be preserved are determined with _PAGE_CHG_MASK symbol that doesn't
cover _PAGE_PAT.  As a consequence, WB caching is requested instead of WC
when running under Xen (also, WP is silently changed to WT, and UC
downgraded to UC_MINUS).  When running on bare metal, WC is not affected,
but WP and WT extra modes are unintentionally replaced with WC and UC,
respectively.

WP and WT modes, encoded with _PAGE_PAT bit set, were introduced by commit
281d4078bec3 ("x86: Make page cache mode a real type").  Care was taken
to extend _PAGE_CACHE_MASK symbol with that additional bit, but that
symbol has never been used for identification of bits preserved when
applying page protection flags.  Support for all cache modes under Xen,
including the problematic WC mode, was then introduced by commit
47591df50512 ("xen: Support Xen pv-domains using PAT").

Extend bitmask used by pgprot_modify() for selecting bits to be preserved
with _PAGE_PAT bit.  However, since that bit can be reused as _PAGE_PSE,
and the _PAGE_CHG_MASK symbol, primarly used by pte_modify(), is likely
intentionally defined with that bit not set, keep that symbol unchanged.

[1] https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/commit/0f0754413f14

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7648
Fixes: 281d4078bec3 ("x86: Make page cache mode a real type")
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: stable@vger.kernel.org # v3.19+
---
 arch/x86/include/asm/pgtable.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Andi Shyti May 31, 2023, 9:35 a.m. UTC | #1
Hi,

a kind reminder about this fix.

Andi

On Fri, May 19, 2023 at 08:36:34PM +0200, Janusz Krzysztofik wrote:
> Visible glitches have been observed when running graphics applications on
> Linux under Xen hypervisor.  Those observations have been confirmed with
> failures from kms_pwrite_crc Intel GPU test that verifies data coherency
> of DRM frame buffer objects using hardware CRC checksums calculated by
> display controllers, exposed to userspace via debugfs.  Affected
> processing paths have then been identified with new IGT test variants that
> mmap the objects using different methods and caching modes [1].
> 
> When running as a Xen PV guest, Linux uses Xen provided PAT configuration
> which is different from its native one.  In particular, Xen specific PTE
> encoding of write-combining caching, likely used by graphics applications,
> differs from the Linux default one found among statically defined minimal
> set of supported modes.  Since Xen defines PTE encoding of the WC mode as
> _PAGE_PAT, it no longer belongs to the minimal set, depends on correct
> handling of _PAGE_PAT bit, and can be mismatched with write-back caching.
> 
> When a user calls mmap() for a DRM buffer object, DRM device specific
> .mmap file operation, called from mmap_region(), takes care of setting PTE
> encoding bits in a vm_page_prot field of an associated virtual memory area
> structure.  Unfortunately, _PAGE_PAT bit is not preserved when the vma's
> .vm_flags are then applied to .vm_page_prot via vm_set_page_prot().  Bits
> to be preserved are determined with _PAGE_CHG_MASK symbol that doesn't
> cover _PAGE_PAT.  As a consequence, WB caching is requested instead of WC
> when running under Xen (also, WP is silently changed to WT, and UC
> downgraded to UC_MINUS).  When running on bare metal, WC is not affected,
> but WP and WT extra modes are unintentionally replaced with WC and UC,
> respectively.
> 
> WP and WT modes, encoded with _PAGE_PAT bit set, were introduced by commit
> 281d4078bec3 ("x86: Make page cache mode a real type").  Care was taken
> to extend _PAGE_CACHE_MASK symbol with that additional bit, but that
> symbol has never been used for identification of bits preserved when
> applying page protection flags.  Support for all cache modes under Xen,
> including the problematic WC mode, was then introduced by commit
> 47591df50512 ("xen: Support Xen pv-domains using PAT").
> 
> Extend bitmask used by pgprot_modify() for selecting bits to be preserved
> with _PAGE_PAT bit.  However, since that bit can be reused as _PAGE_PSE,
> and the _PAGE_CHG_MASK symbol, primarly used by pte_modify(), is likely
> intentionally defined with that bit not set, keep that symbol unchanged.
> 
> [1] https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/commit/0f0754413f14
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7648
> Fixes: 281d4078bec3 ("x86: Make page cache mode a real type")
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: stable@vger.kernel.org # v3.19+
> ---
>  arch/x86/include/asm/pgtable.h | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
> index 15ae4d6ba4768..56466afd04307 100644
> --- a/arch/x86/include/asm/pgtable.h
> +++ b/arch/x86/include/asm/pgtable.h
> @@ -654,8 +654,10 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>  #define pgprot_modify pgprot_modify
>  static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>  {
> -	pgprotval_t preservebits = pgprot_val(oldprot) & _PAGE_CHG_MASK;
> -	pgprotval_t addbits = pgprot_val(newprot) & ~_PAGE_CHG_MASK;
> +	unsigned long mask = _PAGE_CHG_MASK | _PAGE_CACHE_MASK;
> +
> +	pgprotval_t preservebits = pgprot_val(oldprot) & mask;
> +	pgprotval_t addbits = pgprot_val(newprot) & ~mask;
>  	return __pgprot(preservebits | addbits);
>  }
>  
> -- 
> 2.40.1
Jürgen Groß June 1, 2023, 8:47 a.m. UTC | #2
On 31.05.23 20:14, Borislav Petkov wrote:
> On Fri, May 19, 2023 at 08:36:34PM +0200, Janusz Krzysztofik wrote:
>> diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
>> index 15ae4d6ba4768..56466afd04307 100644
>> --- a/arch/x86/include/asm/pgtable.h
>> +++ b/arch/x86/include/asm/pgtable.h
>> @@ -654,8 +654,10 @@ static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
>>   #define pgprot_modify pgprot_modify
>>   static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
>>   {
>> -	pgprotval_t preservebits = pgprot_val(oldprot) & _PAGE_CHG_MASK;
>> -	pgprotval_t addbits = pgprot_val(newprot) & ~_PAGE_CHG_MASK;
>> +	unsigned long mask = _PAGE_CHG_MASK | _PAGE_CACHE_MASK;
>> +
>> +	pgprotval_t preservebits = pgprot_val(oldprot) & mask;
>> +	pgprotval_t addbits = pgprot_val(newprot) & ~mask;
>>   	return __pgprot(preservebits | addbits);
>>   }
>>   
>> -- 
> 
> This certainly needs Jürgen and he's on CC already, moving him to To:.
> 
> Also, why isn't this a Xen-specific fix but you're keeping _PAGE_PAT for
> baremetal too, i.e., modifying the generic function?
> 

As described in the commit message, this only works on bare metal due to the
PAT bit not being needed for WC mappings.

Making this patch Xen specific would try to cure the symptoms without fixing
the underlying problem: _PAGE_PAT should be regarded the same way as the bits
for caching mode (_PAGE_CHG_MASK).

In case a WP or WT mapped memory area would be mmap()-ed on bare metal, the
result would be a WC or UC mapped memory area in userland. This isn't as
problematic as the case under Xen, but it still results in worse performance
than necessary.

IOW:

Acked-by: Juergen Gross <jgross@suse.com>


Juergen
Borislav Petkov June 2, 2023, 2:43 p.m. UTC | #3
On Thu, Jun 01, 2023 at 10:47:39AM +0200, Juergen Gross wrote:
> As described in the commit message, this only works on bare metal due to the
> PAT bit not being needed for WC mappings.
>
> Making this patch Xen specific would try to cure the symptoms without fixing
> the underlying problem: _PAGE_PAT should be regarded the same way as the bits
> for caching mode (_PAGE_CHG_MASK).

So why isn't _PAGE_PAT part of _PAGE_CHG_MASK?

It says above it "Set of bits not changed in pte_modify."

And I don't see pte_modify() changing that bit either.

Right now this "fix" looks like, "let's OR these two masks so that we
can take care of _PAGE_PAT too". But it doesn't make a whole lotta sense
to me...
Jürgen Groß June 2, 2023, 2:48 p.m. UTC | #4
On 02.06.23 16:43, Borislav Petkov wrote:
> On Thu, Jun 01, 2023 at 10:47:39AM +0200, Juergen Gross wrote:
>> As described in the commit message, this only works on bare metal due to the
>> PAT bit not being needed for WC mappings.
>>
>> Making this patch Xen specific would try to cure the symptoms without fixing
>> the underlying problem: _PAGE_PAT should be regarded the same way as the bits
>> for caching mode (_PAGE_CHG_MASK).
> 
> So why isn't _PAGE_PAT part of _PAGE_CHG_MASK?

This would result in problems for large pages: _PAGE_PSE is at the same
position as _PAGE_PAT (large pages are using _PAGE_PAT_LARGE instead).

Yes, x86 ABI is a mess.


Juergen
Jürgen Groß June 2, 2023, 2:53 p.m. UTC | #5
On 02.06.23 16:48, Juergen Gross wrote:
> On 02.06.23 16:43, Borislav Petkov wrote:
>> On Thu, Jun 01, 2023 at 10:47:39AM +0200, Juergen Gross wrote:
>>> As described in the commit message, this only works on bare metal due to the
>>> PAT bit not being needed for WC mappings.
>>>
>>> Making this patch Xen specific would try to cure the symptoms without fixing
>>> the underlying problem: _PAGE_PAT should be regarded the same way as the bits
>>> for caching mode (_PAGE_CHG_MASK).
>>
>> So why isn't _PAGE_PAT part of _PAGE_CHG_MASK?
> 
> This would result in problems for large pages: _PAGE_PSE is at the same
> position as _PAGE_PAT (large pages are using _PAGE_PAT_LARGE instead).
> 
> Yes, x86 ABI is a mess.

Oh, wait: I originally thought _PAGE_CHG_MASK would be used for large pages,
too. There is _HPAGE_CHG_MASK for that purpose.

So adding _PAGE_PAT to _PAGE_CHG_MASK and _PAGE_PAT_LARGE to _HPAGE_CHG_MASK
should do the job. At least I hope so.


Juergen
Janusz Krzysztofik June 5, 2023, 3:51 p.m. UTC | #6
(fixed misspelled Cc: email address of intel-gfx list)

On Friday, 2 June 2023 16:53:30 CEST Juergen Gross wrote:
> On 02.06.23 16:48, Juergen Gross wrote:
> > On 02.06.23 16:43, Borislav Petkov wrote:
> >> On Thu, Jun 01, 2023 at 10:47:39AM +0200, Juergen Gross wrote:
> >>> As described in the commit message, this only works on bare metal due to 
the
> >>> PAT bit not being needed for WC mappings.
> >>>
> >>> Making this patch Xen specific would try to cure the symptoms without 
fixing
> >>> the underlying problem: _PAGE_PAT should be regarded the same way as the 
bits
> >>> for caching mode (_PAGE_CHG_MASK).
> >>
> >> So why isn't _PAGE_PAT part of _PAGE_CHG_MASK?
> > 
> > This would result in problems for large pages: _PAGE_PSE is at the same
> > position as _PAGE_PAT (large pages are using _PAGE_PAT_LARGE instead).
> > 
> > Yes, x86 ABI is a mess.
> 
> Oh, wait: I originally thought _PAGE_CHG_MASK would be used for large pages,
> too. There is _HPAGE_CHG_MASK for that purpose.

Since _HPAGE_CHG_MASK has the _PAGE_PSE aka _PAGE_PAT bit already set, while 
_PAGE_CHK_MASK has not, the real question is not about large pages processing, 
I believe, which won't change whether we add _PAGE_PAT to _PAGE_CHG_MASK or 
not.

If we extend _PAGE_CHG_MASK with _PAGE_PAT bit then its value will be not any 
different from _HPAGE_CHG_MASK.  Then, one may ask why _HPAGE_CHG_MASK, with 
_PAGE_PSE aka PAGE_PAT bit set unlike in _PAGE_CHG_MASK, was introduced once 
for use with large pages, and _PAGE_CHG_MASK left intact for use with standard 
pages, if we now think that adding that bit to _PAGE_CHG_MASK won't break 
processing of standard pages.

If we are sure that adding _PAGE_PAT to _PAGE_CHG_MASK won't break any of its 
users then let's go for it.

Thanks,
Janusz

> 
> So adding _PAGE_PAT to _PAGE_CHG_MASK and _PAGE_PAT_LARGE to _HPAGE_CHG_MASK
> should do the job. At least I hope so.
> 
> 
> Juergen
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 15ae4d6ba4768..56466afd04307 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -654,8 +654,10 @@  static inline pmd_t pmd_modify(pmd_t pmd, pgprot_t newprot)
 #define pgprot_modify pgprot_modify
 static inline pgprot_t pgprot_modify(pgprot_t oldprot, pgprot_t newprot)
 {
-	pgprotval_t preservebits = pgprot_val(oldprot) & _PAGE_CHG_MASK;
-	pgprotval_t addbits = pgprot_val(newprot) & ~_PAGE_CHG_MASK;
+	unsigned long mask = _PAGE_CHG_MASK | _PAGE_CACHE_MASK;
+
+	pgprotval_t preservebits = pgprot_val(oldprot) & mask;
+	pgprotval_t addbits = pgprot_val(newprot) & ~mask;
 	return __pgprot(preservebits | addbits);
 }