Message ID | 20240507061924.20251-1-yan.y.zhao@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Enforce CPU cache flush for non-coherent device assignment | expand |
> From: Zhao, Yan Y <yan.y.zhao@intel.com> > Sent: Tuesday, May 7, 2024 2:19 PM > > However, lookup_memtype() defaults to returning WB for PFNs within the > untracked PAT range, regardless of their actual MTRR type. This behavior > could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable > guest access. Such access might result in MCE on certain platforms, (e.g. > clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms). the VGA range is not exposed to any guest today. So is it just trying to fix a theoretical problem? > @@ -705,7 +705,17 @@ static enum page_cache_mode > lookup_memtype(u64 paddr) > */ > bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn) > { > - enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn)); > + u64 paddr = PFN_PHYS(pfn); > + enum page_cache_mode cm; > + > + /* > + * Check MTRR type for untracked pat range since lookup_memtype() > always > + * returns WB for this range. > + */ > + if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) > + cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, > _PAGE_CACHE_MODE_WB); doing so violates the name of this function. The PAT of the untracked range is still WB and not immune to UC MTRR. > + else > + cm = lookup_memtype(paddr); > > return cm == _PAGE_CACHE_MODE_UC || > cm == _PAGE_CACHE_MODE_UC_MINUS ||
On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote: > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > Sent: Tuesday, May 7, 2024 2:19 PM > > > > However, lookup_memtype() defaults to returning WB for PFNs within the > > untracked PAT range, regardless of their actual MTRR type. This behavior > > could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable > > guest access. Such access might result in MCE on certain platforms, (e.g. > > clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms). > > the VGA range is not exposed to any guest today. So is it just trying to > fix a theoretical problem? Yes. Not sure if VGA range is allowed to be exposed to guest in future, given we have VFIO variant drivers. > > > @@ -705,7 +705,17 @@ static enum page_cache_mode > > lookup_memtype(u64 paddr) > > */ > > bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn) > > { > > - enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn)); > > + u64 paddr = PFN_PHYS(pfn); > > + enum page_cache_mode cm; > > + > > + /* > > + * Check MTRR type for untracked pat range since lookup_memtype() > > always > > + * returns WB for this range. > > + */ > > + if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) > > + cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, > > _PAGE_CACHE_MODE_WB); > > doing so violates the name of this function. The PAT of the untracked > range is still WB and not immune to UC MTRR. Right. Do you think we can rename this function to something like pfn_of_uncachable_effective_memory_type() and make it work under !pat_enabled() too? > > > + else > > + cm = lookup_memtype(paddr); > > > > return cm == _PAGE_CACHE_MODE_UC || > > cm == _PAGE_CACHE_MODE_UC_MINUS || >
On Tue, 7 May 2024 17:12:40 +0800 Yan Zhao <yan.y.zhao@intel.com> wrote: > On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote: > > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > > Sent: Tuesday, May 7, 2024 2:19 PM > > > > > > However, lookup_memtype() defaults to returning WB for PFNs within the > > > untracked PAT range, regardless of their actual MTRR type. This behavior > > > could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable > > > guest access. Such access might result in MCE on certain platforms, (e.g. > > > clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms). > > > > the VGA range is not exposed to any guest today. So is it just trying to > > fix a theoretical problem? > > Yes. Not sure if VGA range is allowed to be exposed to guest in future, given > we have VFIO variant drivers. include/uapi/linux/vfio.h: /* * Expose VGA regions defined for PCI base class 03, subclass 00. * This includes I/O port ranges 0x3b0 to 0x3bb and 0x3c0 to 0x3df * as well as the MMIO range 0xa0000 to 0xbffff. Each implemented * range is found at it's identity mapped offset from the region * offset, for example 0x3b0 is region_info.offset + 0x3b0. Areas * between described ranges are unimplemented. */ VFIO_PCI_VGA_REGION_INDEX, We don't currently support mmap for this region though, so I think we still don't technically require this, but I guess an mmap through KVM is theoretically possible. Thanks, Alex > > > @@ -705,7 +705,17 @@ static enum page_cache_mode > > > lookup_memtype(u64 paddr) > > > */ > > > bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn) > > > { > > > - enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn)); > > > + u64 paddr = PFN_PHYS(pfn); > > > + enum page_cache_mode cm; > > > + > > > + /* > > > + * Check MTRR type for untracked pat range since lookup_memtype() > > > always > > > + * returns WB for this range. > > > + */ > > > + if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) > > > + cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, > > > _PAGE_CACHE_MODE_WB); > > > > doing so violates the name of this function. The PAT of the untracked > > range is still WB and not immune to UC MTRR. > Right. > Do you think we can rename this function to something like > pfn_of_uncachable_effective_memory_type() and make it work under !pat_enabled() > too? > > > > > > + else > > > + cm = lookup_memtype(paddr); > > > > > > return cm == _PAGE_CACHE_MODE_UC || > > > cm == _PAGE_CACHE_MODE_UC_MINUS || > > >
On Wed, May 08, 2024 at 04:14:24PM -0600, Alex Williamson wrote: > On Tue, 7 May 2024 17:12:40 +0800 > Yan Zhao <yan.y.zhao@intel.com> wrote: > > > On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote: > > > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > > > Sent: Tuesday, May 7, 2024 2:19 PM > > > > > > > > However, lookup_memtype() defaults to returning WB for PFNs within the > > > > untracked PAT range, regardless of their actual MTRR type. This behavior > > > > could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable > > > > guest access. Such access might result in MCE on certain platforms, (e.g. > > > > clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms). > > > > > > the VGA range is not exposed to any guest today. So is it just trying to > > > fix a theoretical problem? > > > > Yes. Not sure if VGA range is allowed to be exposed to guest in future, given > > we have VFIO variant drivers. > > include/uapi/linux/vfio.h: > /* > * Expose VGA regions defined for PCI base class 03, subclass 00. > * This includes I/O port ranges 0x3b0 to 0x3bb and 0x3c0 to 0x3df > * as well as the MMIO range 0xa0000 to 0xbffff. Each implemented > * range is found at it's identity mapped offset from the region > * offset, for example 0x3b0 is region_info.offset + 0x3b0. Areas > * between described ranges are unimplemented. > */ > VFIO_PCI_VGA_REGION_INDEX, > > We don't currently support mmap for this region though, so I think we > still don't technically require this, but I guess an mmap through KVM > is theoretically possible. Thanks, Thanks, Alex, for pointing it out. KVM does not mmap this region currently, and I guess KVM will not do the mmap by itself in future too. I added this check for VGA range is because I want to call pat_pfn_immune_to_uc_mtrr() in arch_clean_nonsnoop_dma() in patch 3 to exclude VGA ranges from CLFLUSH, as arch_clean_nonsnoop_dma() is under arch/x86 and not virtualization specific. Also, as Jason once said that "Nothinig about vfio actually guarantees that" "there's no ISA range" (VGA range), I think KVM might see this range after hva_to_pfn_remapped() translation, and adding this check may be helpful to KVM, too. Thanks Yan > > > > > @@ -705,7 +705,17 @@ static enum page_cache_mode > > > > lookup_memtype(u64 paddr) > > > > */ > > > > bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn) > > > > { > > > > - enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn)); > > > > + u64 paddr = PFN_PHYS(pfn); > > > > + enum page_cache_mode cm; > > > > + > > > > + /* > > > > + * Check MTRR type for untracked pat range since lookup_memtype() > > > > always > > > > + * returns WB for this range. > > > > + */ > > > > + if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) > > > > + cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, > > > > _PAGE_CACHE_MODE_WB); > > > > > > doing so violates the name of this function. The PAT of the untracked > > > range is still WB and not immune to UC MTRR. > > Right. > > Do you think we can rename this function to something like > > pfn_of_uncachable_effective_memory_type() and make it work under !pat_enabled() > > too? > > > > > > > > > + else > > > > + cm = lookup_memtype(paddr); > > > > > > > > return cm == _PAGE_CACHE_MODE_UC || > > > > cm == _PAGE_CACHE_MODE_UC_MINUS || > > > > > >
> From: Zhao, Yan Y <yan.y.zhao@intel.com> > Sent: Tuesday, May 7, 2024 5:13 PM > > On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote: > > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > > Sent: Tuesday, May 7, 2024 2:19 PM > > > > > > @@ -705,7 +705,17 @@ static enum page_cache_mode > > > lookup_memtype(u64 paddr) > > > */ > > > bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn) > > > { > > > - enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn)); > > > + u64 paddr = PFN_PHYS(pfn); > > > + enum page_cache_mode cm; > > > + > > > + /* > > > + * Check MTRR type for untracked pat range since lookup_memtype() > > > always > > > + * returns WB for this range. > > > + */ > > > + if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) > > > + cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, > > > _PAGE_CACHE_MODE_WB); > > > > doing so violates the name of this function. The PAT of the untracked > > range is still WB and not immune to UC MTRR. > Right. > Do you think we can rename this function to something like > pfn_of_uncachable_effective_memory_type() and make it work > under !pat_enabled() > too? let's hear from x86/kvm maintainers for their opinions. My gut-feeling is that kvm_is_mmio_pfn() might be moved into the x86 core as the logic there has nothing specific to kvm itself. Also naming-wise it doesn't really matter whether the pfn is mmio. The real point is to find the uncacheble memtype in the primary mmu and then follow it in KVM. from that point probably a pfn_memtype_uncacheable() reads clearer.
+Tom On Thu, May 16, 2024, Kevin Tian wrote: > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > Sent: Tuesday, May 7, 2024 5:13 PM > > > > On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote: > > > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > > > Sent: Tuesday, May 7, 2024 2:19 PM > > > > > > > > @@ -705,7 +705,17 @@ static enum page_cache_mode > > > > lookup_memtype(u64 paddr) > > > > */ > > > > bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn) > > > > { > > > > - enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn)); > > > > + u64 paddr = PFN_PHYS(pfn); > > > > + enum page_cache_mode cm; > > > > + > > > > + /* > > > > + * Check MTRR type for untracked pat range since lookup_memtype() > > > > always > > > > + * returns WB for this range. > > > > + */ > > > > + if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) > > > > + cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, > > > > _PAGE_CACHE_MODE_WB); > > > > > > doing so violates the name of this function. The PAT of the untracked > > > range is still WB and not immune to UC MTRR. > > Right. > > Do you think we can rename this function to something like > > pfn_of_uncachable_effective_memory_type() and make it work under > > !pat_enabled() too? > > let's hear from x86/kvm maintainers for their opinions. > > My gut-feeling is that kvm_is_mmio_pfn() might be moved into the > x86 core as the logic there has nothing specific to kvm itself. Also > naming-wise it doesn't really matter whether the pfn is mmio. The > real point is to find the uncacheble memtype in the primary mmu > and then follow it in KVM. Yeaaaah, we've got an existing problem there. When AMD's SME is enabled, KVM uses kvm_is_mmio_pfn() to determine whether or not to map memory into the guest as encrypted or plain text. I.e. KVM really does try to use this helper to detect MMIO vs. RAM. I highly doubt that actually works in all setups. For SME, it seems like the best approach would be grab the C-Bit from the host page tables, similar to how KVM uses host_pfn_mapping_level(). SME aside, I don't have objection to moving kvm_is_mmio_pfn() out of KVM. > from that point probably a pfn_memtype_uncacheable() reads clearer. or even just pfn_is_memtype_uc()?
> From: Sean Christopherson <seanjc@google.com> > Sent: Thursday, May 16, 2024 10:07 PM > > +Tom > > On Thu, May 16, 2024, Kevin Tian wrote: > > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > > Sent: Tuesday, May 7, 2024 5:13 PM > > > > > > On Tue, May 07, 2024 at 04:26:37PM +0800, Tian, Kevin wrote: > > > > > From: Zhao, Yan Y <yan.y.zhao@intel.com> > > > > > Sent: Tuesday, May 7, 2024 2:19 PM > > > > > > > > > > @@ -705,7 +705,17 @@ static enum page_cache_mode > > > > > lookup_memtype(u64 paddr) > > > > > */ > > > > > bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn) > > > > > { > > > > > - enum page_cache_mode cm = > lookup_memtype(PFN_PHYS(pfn)); > > > > > + u64 paddr = PFN_PHYS(pfn); > > > > > + enum page_cache_mode cm; > > > > > + > > > > > + /* > > > > > + * Check MTRR type for untracked pat range since > lookup_memtype() > > > > > always > > > > > + * returns WB for this range. > > > > > + */ > > > > > + if (x86_platform.is_untracked_pat_range(paddr, paddr + > PAGE_SIZE)) > > > > > + cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, > > > > > _PAGE_CACHE_MODE_WB); > > > > > > > > doing so violates the name of this function. The PAT of the untracked > > > > range is still WB and not immune to UC MTRR. > > > Right. > > > Do you think we can rename this function to something like > > > pfn_of_uncachable_effective_memory_type() and make it work under > > > !pat_enabled() too? > > > > let's hear from x86/kvm maintainers for their opinions. > > > > My gut-feeling is that kvm_is_mmio_pfn() might be moved into the > > x86 core as the logic there has nothing specific to kvm itself. Also > > naming-wise it doesn't really matter whether the pfn is mmio. The > > real point is to find the uncacheble memtype in the primary mmu > > and then follow it in KVM. > > Yeaaaah, we've got an existing problem there. When AMD's SME is enabled, > KVM > uses kvm_is_mmio_pfn() to determine whether or not to map memory into > the guest > as encrypted or plain text. I.e. KVM really does try to use this helper to > detect MMIO vs. RAM. I highly doubt that actually works in all setups. > > For SME, it seems like the best approach would be grab the C-Bit from the > host > page tables, similar to how KVM uses host_pfn_mapping_level(). yes that sounds clearer. Checking MMIO vs. RAM is kind of indirect hint. > > SME aside, I don't have objection to moving kvm_is_mmio_pfn() out of KVM. > > > from that point probably a pfn_memtype_uncacheable() reads clearer. > > or even just pfn_is_memtype_uc()? yes, better.
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index 36b603d0cdde..e85e8c5737ad 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -705,7 +705,17 @@ static enum page_cache_mode lookup_memtype(u64 paddr) */ bool pat_pfn_immune_to_uc_mtrr(unsigned long pfn) { - enum page_cache_mode cm = lookup_memtype(PFN_PHYS(pfn)); + u64 paddr = PFN_PHYS(pfn); + enum page_cache_mode cm; + + /* + * Check MTRR type for untracked pat range since lookup_memtype() always + * returns WB for this range. + */ + if (x86_platform.is_untracked_pat_range(paddr, paddr + PAGE_SIZE)) + cm = pat_x_mtrr_type(paddr, paddr + PAGE_SIZE, _PAGE_CACHE_MODE_WB); + else + cm = lookup_memtype(paddr); return cm == _PAGE_CACHE_MODE_UC || cm == _PAGE_CACHE_MODE_UC_MINUS ||
Let pat_pfn_immune_to_uc_mtrr() check MTRR type for PFNs in untracked PAT range. pat_pfn_immune_to_uc_mtrr() is used by KVM to distinguish MMIO PFNs and give them UC memory type in the EPT page tables. When pat_pfn_immune_to_uc_mtrr() identifies a PFN as having a PAT type of UC/WC/UC-, it indicates that the PFN should be accessed using an uncacheable memory type. Consequently, KVM maps it with UC in the EPT to ensure that the guest's memory access is uncacheable. Internally, pat_pfn_immune_to_uc_mtrr() utilizes lookup_memtype() to determine PAT type for a PFN. For a PFN outside untracked PAT range, the returned PAT type is either - The type set by memtype_reserve() (which, in turn, calls pat_x_mtrr_type() to adjust the requested type to UC- if the requested type is WB but the MTRR type does not match WB), - Or UC-, if memtype_reserve() has not yet been invoked for this PFN. However, lookup_memtype() defaults to returning WB for PFNs within the untracked PAT range, regardless of their actual MTRR type. This behavior could lead KVM to misclassify the PFN as non-MMIO, permitting cacheable guest access. Such access might result in MCE on certain platforms, (e.g. clflush on VGA range (0xA0000-0xBFFFF) triggers MCE on some platforms). Hence, invoke pat_x_mtrr_type() for PFNs within the untracked PAT range so as to take MTRR type into account to mitigate potential MCEs. Fixes: b8d7044bcff7 ("x86/mm: add a function to check if a pfn is UC/UC-/WC") Cc: Kevin Tian <kevin.tian@intel.com> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com> --- arch/x86/mm/pat/memtype.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)