diff mbox series

[1/5] x86/pat: Let pat_pfn_immune_to_uc_mtrr() check MTRR for untracked PAT range

Message ID 20240507061924.20251-1-yan.y.zhao@intel.com (mailing list archive)
State New, archived
Headers show
Series Enforce CPU cache flush for non-coherent device assignment | expand

Commit Message

Yan Zhao May 7, 2024, 6:19 a.m. UTC
Let pat_pfn_immune_to_uc_mtrr() check MTRR type for PFNs in untracked PAT
range.

pat_pfn_immune_to_uc_mtrr() is used by KVM to distinguish MMIO PFNs and
give them UC memory type in the EPT page tables.
When pat_pfn_immune_to_uc_mtrr() identifies a PFN as having a PAT type of
UC/WC/UC-, it indicates that the PFN should be accessed using an
uncacheable memory type. Consequently, KVM maps it with UC in the EPT to
ensure that the guest's memory access is uncacheable.

Internally, pat_pfn_immune_to_uc_mtrr() utilizes lookup_memtype() to
determine PAT type for a PFN. For a PFN outside untracked PAT range, the
returned PAT type is either
- The type set by memtype_reserve()
  (which, in turn, calls pat_x_mtrr_type() to adjust the requested type to
   UC- if the requested type is WB but the MTRR type does not match WB),
- Or UC-, if memtype_reserve() has not yet been invoked for this PFN.

However, lookup_memtype() defaults to returning WB for PFNs within the
untracked PAT range, regardless of their actual MTRR type. This behavior
could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable
guest access. Such access might result in MCE on certain platforms, (e.g.
clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms).

Hence, invoke pat_x_mtrr_type() for PFNs within the untracked PAT range so
as to take MTRR type into account to mitigate potential MCEs.

Fixes: b8d7044bcff7 ("x86/mm: add a function to check if a pfn is UC/UC-/WC")
Cc: Kevin Tian <kevin.tian@intel.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/mm/pat/memtype.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Tian, Kevin May 7, 2024, 8:26 a.m. UTC | #1
> From: Zhao, Yan Y <yan.y.zhao@intel.com>
> Sent: Tuesday, May 7, 2024 2:19 PM
> 
> However, lookup_memtype() defaults to returning WB for PFNs within the
> untracked PAT range, regardless of their actual MTRR type. This behavior
> could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable
> guest access. Such access might result in MCE on certain platforms, (e.g.
> clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms).

the VGA range is not exposed to any guest today. So is it just trying to
fix a theoretical problem?

> @@ -705,7 +705,17 @@ static enum page_cache_mode
> lookup_memtype(u64 paddr)
>   */
>  bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
>  {
> -	enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn));
> +	u64 paddr = PFN_PHYS(pfn);
> +	enum page_cache_mode cm;
> +
> +	/*
> +	 * Check MTRR type for untracked pat range since lookup_memtype()
> always
> +	 * returns WB for this range.
> +	 */
> +	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
> +		cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE,
> _PAGE_CACHE_MODE_WB);

doing so violates the name of this function. The PAT of the untracked
range is still WB and not immune to UC MTRR.

> +	else
> +		cm = lookup_memtype(paddr);
> 
>  	return cm == _PAGE_CACHE_MODE_UC ||
>  	       cm == _PAGE_CACHE_MODE_UC_MINUS ||
Yan Zhao May 7, 2024, 9:12 a.m. UTC | #2
On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote:
> > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > Sent: Tuesday, May 7, 2024 2:19 PM
> > 
> > However, lookup_memtype() defaults to returning WB for PFNs within the
> > untracked PAT range, regardless of their actual MTRR type. This behavior
> > could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable
> > guest access. Such access might result in MCE on certain platforms, (e.g.
> > clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms).
> 
> the VGA range is not exposed to any guest today. So is it just trying to
> fix a theoretical problem?

Yes. Not sure if VGA range is allowed to be exposed to guest in future, given
we have VFIO variant drivers.

> 
> > @@ -705,7 +705,17 @@ static enum page_cache_mode
> > lookup_memtype(u64 paddr)
> >   */
> >  bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
> >  {
> > -	enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn));
> > +	u64 paddr = PFN_PHYS(pfn);
> > +	enum page_cache_mode cm;
> > +
> > +	/*
> > +	 * Check MTRR type for untracked pat range since lookup_memtype()
> > always
> > +	 * returns WB for this range.
> > +	 */
> > +	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
> > +		cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE,
> > _PAGE_CACHE_MODE_WB);
> 
> doing so violates the name of this function. The PAT of the untracked
> range is still WB and not immune to UC MTRR.
Right.
Do you think we can rename this function to something like
pfn_of_uncachable_effective_memory_type() and make it work under !pat_enabled()
too?

> 
> > +	else
> > +		cm = lookup_memtype(paddr);
> > 
> >  	return cm == _PAGE_CACHE_MODE_UC ||
> >  	       cm == _PAGE_CACHE_MODE_UC_MINUS ||
>
Alex Williamson May 8, 2024, 10:14 p.m. UTC | #3
On Tue, 7 May 2024 17:12:40 +0800
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote:
> > > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > > Sent: Tuesday, May 7, 2024 2:19 PM
> > > 
> > > However, lookup_memtype() defaults to returning WB for PFNs within the
> > > untracked PAT range, regardless of their actual MTRR type. This behavior
> > > could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable
> > > guest access. Such access might result in MCE on certain platforms, (e.g.
> > > clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms).  
> > 
> > the VGA range is not exposed to any guest today. So is it just trying to
> > fix a theoretical problem?  
> 
> Yes. Not sure if VGA range is allowed to be exposed to guest in future, given
> we have VFIO variant drivers.

include/uapi/linux/vfio.h:
        /*
         * Expose VGA regions defined for PCI base class 03, subclass 00.
         * This includes I/O port ranges 0x3b0 to 0x3bb and 0x3c0 to 0x3df
         * as well as the MMIO range 0xa0000 to 0xbffff.  Each implemented
         * range is found at it's identity mapped offset from the region
         * offset, for example 0x3b0 is region_info.offset + 0x3b0.  Areas
         * between described ranges are unimplemented.
         */
        VFIO_PCI_VGA_REGION_INDEX,

We don't currently support mmap for this region though, so I think we
still don't technically require this, but I guess an mmap through KVM
is theoretically possible.  Thanks,

Alex

> > > @@ -705,7 +705,17 @@ static enum page_cache_mode
> > > lookup_memtype(u64 paddr)
> > >   */
> > >  bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
> > >  {
> > > -	enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn));
> > > +	u64 paddr = PFN_PHYS(pfn);
> > > +	enum page_cache_mode cm;
> > > +
> > > +	/*
> > > +	 * Check MTRR type for untracked pat range since lookup_memtype()
> > > always
> > > +	 * returns WB for this range.
> > > +	 */
> > > +	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
> > > +		cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE,
> > > _PAGE_CACHE_MODE_WB);  
> > 
> > doing so violates the name of this function. The PAT of the untracked
> > range is still WB and not immune to UC MTRR.  
> Right.
> Do you think we can rename this function to something like
> pfn_of_uncachable_effective_memory_type() and make it work under !pat_enabled()
> too?
> 
> >   
> > > +	else
> > > +		cm = lookup_memtype(paddr);
> > > 
> > >  	return cm == _PAGE_CACHE_MODE_UC ||
> > >  	       cm == _PAGE_CACHE_MODE_UC_MINUS ||  
> >   
>
Yan Zhao May 9, 2024, 3:36 a.m. UTC | #4
On Wed, May 08, 2024 at 04:14:24PM -0600, Alex Williamson wrote:
> On Tue, 7 May 2024 17:12:40 +0800
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote:
> > > > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > > > Sent: Tuesday, May 7, 2024 2:19 PM
> > > > 
> > > > However, lookup_memtype() defaults to returning WB for PFNs within the
> > > > untracked PAT range, regardless of their actual MTRR type. This behavior
> > > > could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable
> > > > guest access. Such access might result in MCE on certain platforms, (e.g.
> > > > clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms).  
> > > 
> > > the VGA range is not exposed to any guest today. So is it just trying to
> > > fix a theoretical problem?  
> > 
> > Yes. Not sure if VGA range is allowed to be exposed to guest in future, given
> > we have VFIO variant drivers.
> 
> include/uapi/linux/vfio.h:
>         /*
>          * Expose VGA regions defined for PCI base class 03, subclass 00.
>          * This includes I/O port ranges 0x3b0 to 0x3bb and 0x3c0 to 0x3df
>          * as well as the MMIO range 0xa0000 to 0xbffff.  Each implemented
>          * range is found at it's identity mapped offset from the region
>          * offset, for example 0x3b0 is region_info.offset + 0x3b0.  Areas
>          * between described ranges are unimplemented.
>          */
>         VFIO_PCI_VGA_REGION_INDEX,
> 
> We don't currently support mmap for this region though, so I think we
> still don't technically require this, but I guess an mmap through KVM
> is theoretically possible.  Thanks,

Thanks, Alex, for pointing it out.
KVM does not mmap this region currently, and I guess KVM will not do the mmap
by itself in future too.

I added this check for VGA range is because I want to call
pat_pfn_immune_to_uc_mtrr() in arch_clean_nonsnoop_dma() in patch 3 to exclude VGA
ranges from CLFLUSH,  as arch_clean_nonsnoop_dma() is under arch/x86 and not
virtualization specific.

Also, as Jason once said that "Nothinig about vfio actually guarantees that"
"there's no ISA range" (VGA range), I think KVM might see this range after
hva_to_pfn_remapped() translation, and adding this check may be helpful to KVM,
too.

Thanks
Yan

> 
> > > > @@ -705,7 +705,17 @@ static enum page_cache_mode
> > > > lookup_memtype(u64 paddr)
> > > >   */
> > > >  bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
> > > >  {
> > > > -	enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn));
> > > > +	u64 paddr = PFN_PHYS(pfn);
> > > > +	enum page_cache_mode cm;
> > > > +
> > > > +	/*
> > > > +	 * Check MTRR type for untracked pat range since lookup_memtype()
> > > > always
> > > > +	 * returns WB for this range.
> > > > +	 */
> > > > +	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
> > > > +		cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE,
> > > > _PAGE_CACHE_MODE_WB);  
> > > 
> > > doing so violates the name of this function. The PAT of the untracked
> > > range is still WB and not immune to UC MTRR.  
> > Right.
> > Do you think we can rename this function to something like
> > pfn_of_uncachable_effective_memory_type() and make it work under !pat_enabled()
> > too?
> > 
> > >   
> > > > +	else
> > > > +		cm = lookup_memtype(paddr);
> > > > 
> > > >  	return cm == _PAGE_CACHE_MODE_UC ||
> > > >  	       cm == _PAGE_CACHE_MODE_UC_MINUS ||  
> > >   
> > 
>
Tian, Kevin May 16, 2024, 7:42 a.m. UTC | #5
> From: Zhao, Yan Y <yan.y.zhao@intel.com>
> Sent: Tuesday, May 7, 2024 5:13 PM
> 
> On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote:
> > > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > > Sent: Tuesday, May 7, 2024 2:19 PM
> > >
> > > @@ -705,7 +705,17 @@ static enum page_cache_mode
> > > lookup_memtype(u64 paddr)
> > >   */
> > >  bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
> > >  {
> > > -	enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn));
> > > +	u64 paddr = PFN_PHYS(pfn);
> > > +	enum page_cache_mode cm;
> > > +
> > > +	/*
> > > +	 * Check MTRR type for untracked pat range since lookup_memtype()
> > > always
> > > +	 * returns WB for this range.
> > > +	 */
> > > +	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
> > > +		cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE,
> > > _PAGE_CACHE_MODE_WB);
> >
> > doing so violates the name of this function. The PAT of the untracked
> > range is still WB and not immune to UC MTRR.
> Right.
> Do you think we can rename this function to something like
> pfn_of_uncachable_effective_memory_type() and make it work
> under !pat_enabled()
> too?

let's hear from x86/kvm maintainers for their opinions.

My gut-feeling is that kvm_is_mmio_pfn() might be moved into the
x86 core as the logic there has nothing specific to kvm itself. Also
naming-wise it doesn't really matter whether the pfn is mmio. The
real point is to find the uncacheble memtype in the primary mmu
and then follow it in KVM.

from that point probably a pfn_memtype_uncacheable() reads clearer.
Sean Christopherson May 16, 2024, 2:07 p.m. UTC | #6
+Tom

On Thu, May 16, 2024, Kevin Tian wrote:
> > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > Sent: Tuesday, May 7, 2024 5:13 PM
> > 
> > On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote:
> > > > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > > > Sent: Tuesday, May 7, 2024 2:19 PM
> > > >
> > > > @@ -705,7 +705,17 @@ static enum page_cache_mode
> > > > lookup_memtype(u64 paddr)
> > > >   */
> > > >  bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
> > > >  {
> > > > -	enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn));
> > > > +	u64 paddr = PFN_PHYS(pfn);
> > > > +	enum page_cache_mode cm;
> > > > +
> > > > +	/*
> > > > +	 * Check MTRR type for untracked pat range since lookup_memtype()
> > > > always
> > > > +	 * returns WB for this range.
> > > > +	 */
> > > > +	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
> > > > +		cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE,
> > > > _PAGE_CACHE_MODE_WB);
> > >
> > > doing so violates the name of this function. The PAT of the untracked
> > > range is still WB and not immune to UC MTRR.
> > Right.
> > Do you think we can rename this function to something like
> > pfn_of_uncachable_effective_memory_type() and make it work under
> > !pat_enabled() too?
> 
> let's hear from x86/kvm maintainers for their opinions.
> 
> My gut-feeling is that kvm_is_mmio_pfn() might be moved into the
> x86 core as the logic there has nothing specific to kvm itself. Also
> naming-wise it doesn't really matter whether the pfn is mmio. The
> real point is to find the uncacheble memtype in the primary mmu
> and then follow it in KVM.

Yeaaaah, we've got an existing problem there.  When AMD's SME is enabled, KVM
uses kvm_is_mmio_pfn() to determine whether or not to map memory into the guest
as encrypted or plain text.  I.e. KVM really does try to use this helper to
detect MMIO vs. RAM.  I highly doubt that actually works in all setups.

For SME, it seems like the best approach would be grab the C-Bit from the host
page tables, similar to how KVM uses host_pfn_mapping_level().

SME aside, I don't have objection to moving kvm_is_mmio_pfn() out of KVM.

> from that point probably a pfn_memtype_uncacheable() reads clearer.

or even just pfn_is_memtype_uc()?
Tian, Kevin May 20, 2024, 2:36 a.m. UTC | #7
> From: Sean Christopherson <seanjc@google.com>
> Sent: Thursday, May 16, 2024 10:07 PM
> 
> +Tom
> 
> On Thu, May 16, 2024, Kevin Tian wrote:
> > > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > > Sent: Tuesday, May 7, 2024 5:13 PM
> > >
> > > On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote:
> > > > > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > > > > Sent: Tuesday, May 7, 2024 2:19 PM
> > > > >
> > > > > @@ -705,7 +705,17 @@ static enum page_cache_mode
> > > > > lookup_memtype(u64 paddr)
> > > > >   */
> > > > >  bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
> > > > >  {
> > > > > -	enum page_cache_mode cm =
> lookup_memtype(PFN_PHYS(pfn));
> > > > > +	u64 paddr = PFN_PHYS(pfn);
> > > > > +	enum page_cache_mode cm;
> > > > > +
> > > > > +	/*
> > > > > +	 * Check MTRR type for untracked pat range since
> lookup_memtype()
> > > > > always
> > > > > +	 * returns WB for this range.
> > > > > +	 */
> > > > > +	if (x86_platform.is_untracked_pat_range(paddr, paddr +
> PAGE_SIZE))
> > > > > +		cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE,
> > > > > _PAGE_CACHE_MODE_WB);
> > > >
> > > > doing so violates the name of this function. The PAT of the untracked
> > > > range is still WB and not immune to UC MTRR.
> > > Right.
> > > Do you think we can rename this function to something like
> > > pfn_of_uncachable_effective_memory_type() and make it work under
> > > !pat_enabled() too?
> >
> > let's hear from x86/kvm maintainers for their opinions.
> >
> > My gut-feeling is that kvm_is_mmio_pfn() might be moved into the
> > x86 core as the logic there has nothing specific to kvm itself. Also
> > naming-wise it doesn't really matter whether the pfn is mmio. The
> > real point is to find the uncacheble memtype in the primary mmu
> > and then follow it in KVM.
> 
> Yeaaaah, we've got an existing problem there.  When AMD's SME is enabled,
> KVM
> uses kvm_is_mmio_pfn() to determine whether or not to map memory into
> the guest
> as encrypted or plain text.  I.e. KVM really does try to use this helper to
> detect MMIO vs. RAM.  I highly doubt that actually works in all setups.
> 
> For SME, it seems like the best approach would be grab the C-Bit from the
> host
> page tables, similar to how KVM uses host_pfn_mapping_level().

yes that sounds clearer. Checking MMIO vs. RAM is kind of indirect hint.

> 
> SME aside, I don't have objection to moving kvm_is_mmio_pfn() out of KVM.
> 
> > from that point probably a pfn_memtype_uncacheable() reads clearer.
> 
> or even just pfn_is_memtype_uc()?

yes, better.
diff mbox series

Patch

diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 36b603d0cdde..e85e8c5737ad 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -705,7 +705,17 @@  static enum page_cache_mode lookup_memtype(u64 paddr)
  */
 bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn)
 {
-	enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn));
+	u64 paddr = PFN_PHYS(pfn);
+	enum page_cache_mode cm;
+
+	/*
+	 * Check MTRR type for untracked pat range since lookup_memtype() always
+	 * returns WB for this range.
+	 */
+	if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE))
+		cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, _PAGE_CACHE_MODE_WB);
+	else
+		cm = lookup_memtype(paddr);
 
 	return cm == _PAGE_CACHE_MODE_UC ||
 	       cm == _PAGE_CACHE_MODE_UC_MINUS ||