diff mbox series

[3/5] x86/mm: Introduce and export interface arch_clean_nonsnoop_dma()

Message ID 20240507062044.20399-1-yan.y.zhao@intel.com (mailing list archive)
State New, archived
Headers show
Series Enforce CPU cache flush for non-coherent device assignment | expand

Commit Message

Yan Zhao May 7, 2024, 6:20 a.m. UTC
Introduce and export interface arch_clean_nonsnoop_dma() to flush CPU
caches for memory involved in non-coherent DMAs (DMAs that lack CPU cache
snooping).

When IOMMU does not enforce cache coherency, devices are allowed to perform
non-coherent DMAs. This scenario poses a risk of information leakage when
the device is assigned into a VM. Specifically, a malicious guest could
potentially retrieve stale host data through non-coherent DMA reads to
physical memory, with data initialized by host (e.g., zeros) still residing
in the cache.

Additionally, host kernel (e.g. by a ksm kthread) is possible to read
inconsistent data from CPU cache/memory (left by a malicious guest) after
a page is unpinned for non-coherent DMA but before it's freed.

Therefore, VFIO/IOMMUFD must initiate a CPU cache flush for pages involved
in non-coherent DMAs prior to or following their mapping or unmapping to or
from the IOMMU.

Introduce and export an interface accepting a contiguous physical address
range as input to help flush CPU caches in architecture specific way for
VFIO/IOMMUFD. (Currently, x86 only).

Given CLFLUSH on MMIOs in x86 is generally undesired and sometimes will
cause MCE on certain platforms (e.g. executing CLFLUSH on VGA ranges
0xA0000-0xBFFFF causes MCE on some platforms). Meanwhile, some MMIOs are
cacheable and demands CLFLUSH (e.g. certain MMIOs for PMEM). Hence, a
method of checking host PAT/MTRR for uncacheable memory is adopted.

This implementation always performs CLFLUSH on "pfn_valid() && !reserved"
pages (since they are not possible to be MMIOs).
For the reserved or !pfn_valid() cases, check host PAT/MTRR to bypass
uncacheable physical ranges in host and do CFLUSH on the rest cacheable
ranges.

Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Kevin Tian <kevin.tian@intel.com>
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 arch/x86/include/asm/cacheflush.h |  3 ++
 arch/x86/mm/pat/set_memory.c      | 88 +++++++++++++++++++++++++++++++
 include/linux/cacheflush.h        |  6 +++
 3 files changed, 97 insertions(+)

Comments

Tian, Kevin May 7, 2024, 8:51 a.m. UTC | #1
> From: Zhao, Yan Y <yan.y.zhao@intel.com>
> Sent: Tuesday, May 7, 2024 2:21 PM
> 
> +
> +/*
> + * Flush a reserved page or !pfn_valid() PFN.
> + * Flush is not performed if the PFN is accessed in uncacheable type. i.e.
> + * - PAT type is UC/UC-/WC when PAT is enabled
> + * - MTRR type is UC/WC/WT/WP when PAT is not enabled.
> + *   (no need to do CLFLUSH though WT/WP is cacheable).
> + */

As long as a page is cacheable (being WB/WT/WP) the malicious
guest can always use non-coherent DMA to make cache/memory
inconsistent, hence clflush is still required after unmapping such
page from the IOMMU page table to avoid leaking the inconsistency
state back to the host.

> +
> +/**
> + * arch_clean_nonsnoop_dma - flush a cache range for non-coherent DMAs
> + *                           (DMAs that lack CPU cache snooping).
> + * @phys_addr:	physical address start
> + * @length:	number of bytes to flush
> + */
> +void arch_clean_nonsnoop_dma(phys_addr_t phys_addr, size_t length)
> +{
> +	unsigned long nrpages, pfn;
> +	unsigned long i;
> +
> +	pfn = PHYS_PFN(phys_addr);
> +	nrpages = PAGE_ALIGN((phys_addr & ~PAGE_MASK) + length) >>
> PAGE_SHIFT;
> +
> +	for (i = 0; i < nrpages; i++, pfn++)
> +		clflush_pfn(pfn);
> +}
> +EXPORT_SYMBOL_GPL(arch_clean_nonsnoop_dma);

this is not a good name. The code has nothing to do with nonsnoop
dma aspect. It's just a general helper accepting a physical pfn to flush
CPU cache, with nonsnoop dma as one potential caller usage.

It's clearer to be arch_flush_cache_phys().

and probably drm_clflush_pages() can be converted to use this
helper too.
Yan Zhao May 7, 2024, 9:40 a.m. UTC | #2
On Tue, May 07, 2024 at 04:51:31PM +0800, Tian, Kevin wrote:
> > From: Zhao, Yan Y <yan.y.zhao@intel.com>
> > Sent: Tuesday, May 7, 2024 2:21 PM
> > 
> > +
> > +/*
> > + * Flush a reserved page or !pfn_valid() PFN.
> > + * Flush is not performed if the PFN is accessed in uncacheable type. i.e.
> > + * - PAT type is UC/UC-/WC when PAT is enabled
> > + * - MTRR type is UC/WC/WT/WP when PAT is not enabled.
> > + *   (no need to do CLFLUSH though WT/WP is cacheable).
> > + */
> 
> As long as a page is cacheable (being WB/WT/WP) the malicious
> guest can always use non-coherent DMA to make cache/memory
> inconsistent, hence clflush is still required after unmapping such
> page from the IOMMU page table to avoid leaking the inconsistency
> state back to the host.
You are right.
I should only check MTRR type is UC or WC, as below.

static void clflush_reserved_or_invalid_pfn(unsigned long pfn)                  
{                                                                               
       const int size = boot_cpu_data.x86_clflush_size;                         
       unsigned int i;                                                          
       void *va;                                                                
                                                                                
       if (!pat_enabled()) {                                                    
               u64 start = PFN_PHYS(pfn), end = start + PAGE_SIZE;              
               u8 mtrr_type, uniform;                                           
                                                                                
               mtrr_type = mtrr_type_lookup(start, end, &uniform);              
               if ((mtrr_type == MTRR_TYPE_UNCACHABLE) ||( mtrry_type == MTRR_TYPE_WRCOMB))                               
                       return;                                                  
       } else if (pat_pfn_immune_to_uc_mtrr(pfn)) {                             
               return;                                                          
       }                                                                        
       ...                                                                           
} 

Also for the pat_enabled() case where pat_pfn_immune_to_uc_mtrr() is called,
maybe pat_x_mtrr_type() cannot be called in patch 1 for untracked PAT range,
because pat_x_mtrr_type() will return UC- if MTRR type is WT/WP, which will cause
pat_pfn_immune_to_uc_mtrr() to return true and CLFLUSH would be skipped.


static unsigned long pat_x_mtrr_type(u64 start, u64 end,
                                     enum page_cache_mode req_type)
{
        /*
         * Look for MTRR hint to get the effective type in case where PAT
         * request is for WB.
         */
        if (req_type == _PAGE_CACHE_MODE_WB) {
                u8 mtrr_type, uniform;

                mtrr_type = mtrr_type_lookup(start, end, &uniform);
                if (mtrr_type != MTRR_TYPE_WRBACK)
                        return _PAGE_CACHE_MODE_UC_MINUS;

                return _PAGE_CACHE_MODE_WB;
        }

        return req_type;
}

> 
> > +
> > +/**
> > + * arch_clean_nonsnoop_dma - flush a cache range for non-coherent DMAs
> > + *                           (DMAs that lack CPU cache snooping).
> > + * @phys_addr:	physical address start
> > + * @length:	number of bytes to flush
> > + */
> > +void arch_clean_nonsnoop_dma(phys_addr_t phys_addr, size_t length)
> > +{
> > +	unsigned long nrpages, pfn;
> > +	unsigned long i;
> > +
> > +	pfn = PHYS_PFN(phys_addr);
> > +	nrpages = PAGE_ALIGN((phys_addr & ~PAGE_MASK) + length) >>
> > PAGE_SHIFT;
> > +
> > +	for (i = 0; i < nrpages; i++, pfn++)
> > +		clflush_pfn(pfn);
> > +}
> > +EXPORT_SYMBOL_GPL(arch_clean_nonsnoop_dma);
> 
> this is not a good name. The code has nothing to do with nonsnoop
> dma aspect. It's just a general helper accepting a physical pfn to flush
> CPU cache, with nonsnoop dma as one potential caller usage.
> 
> It's clearer to be arch_flush_cache_phys().
> 
> and probably drm_clflush_pages() can be converted to use this
> helper too.
Yes. I agree, though arch_clean_nonsnoop_dma() might have its merit if its
implementation in other platforms would do some nonsnoop_dma specific
implementations.
Christoph Hellwig May 20, 2024, 2:07 p.m. UTC | #3
On Tue, May 07, 2024 at 02:20:44PM +0800, Yan Zhao wrote:
> Introduce and export interface arch_clean_nonsnoop_dma() to flush CPU
> caches for memory involved in non-coherent DMAs (DMAs that lack CPU cache
> snooping).

Err, no.  There should really be no exported cache manipulation macros,
as drivers are almost guaranteed to get this wrong.  I've added
Russell to the Cc list who has been extremtly vocal about this at least
for arm.
Jason Gunthorpe May 21, 2024, 3:49 p.m. UTC | #4
On Mon, May 20, 2024 at 07:07:10AM -0700, Christoph Hellwig wrote:
> On Tue, May 07, 2024 at 02:20:44PM +0800, Yan Zhao wrote:
> > Introduce and export interface arch_clean_nonsnoop_dma() to flush CPU
> > caches for memory involved in non-coherent DMAs (DMAs that lack CPU cache
> > snooping).
> 
> Err, no.  There should really be no exported cache manipulation macros,
> as drivers are almost guaranteed to get this wrong.  I've added
> Russell to the Cc list who has been extremtly vocal about this at least
> for arm.

We could possibly move this under some IOMMU core API (ie flush and
map, unmap and flush), the iommu APIs are non-modular so this could
avoid the exported symbol.

Jason
Jason Gunthorpe May 21, 2024, 4 p.m. UTC | #5
On Tue, May 21, 2024 at 12:49:39PM -0300, Jason Gunthorpe wrote:
> On Mon, May 20, 2024 at 07:07:10AM -0700, Christoph Hellwig wrote:
> > On Tue, May 07, 2024 at 02:20:44PM +0800, Yan Zhao wrote:
> > > Introduce and export interface arch_clean_nonsnoop_dma() to flush CPU
> > > caches for memory involved in non-coherent DMAs (DMAs that lack CPU cache
> > > snooping).
> > 
> > Err, no.  There should really be no exported cache manipulation macros,
> > as drivers are almost guaranteed to get this wrong.  I've added
> > Russell to the Cc list who has been extremtly vocal about this at least
> > for arm.
> 
> We could possibly move this under some IOMMU core API (ie flush and
> map, unmap and flush), the iommu APIs are non-modular so this could
> avoid the exported symbol.

Though this would be pretty difficult for unmap as we don't have the
pfns in the core code to flush. I don't think we have alot of good
options but to make iommufd & VFIO handle this directly as they have
the list of pages to flush on the unmap side. Use a namespace?

Jason
Yan Zhao May 22, 2024, 3:41 a.m. UTC | #6
On Tue, May 21, 2024 at 01:00:16PM -0300, Jason Gunthorpe wrote:
> On Tue, May 21, 2024 at 12:49:39PM -0300, Jason Gunthorpe wrote:
> > On Mon, May 20, 2024 at 07:07:10AM -0700, Christoph Hellwig wrote:
> > > On Tue, May 07, 2024 at 02:20:44PM +0800, Yan Zhao wrote:
> > > > Introduce and export interface arch_clean_nonsnoop_dma() to flush CPU
> > > > caches for memory involved in non-coherent DMAs (DMAs that lack CPU cache
> > > > snooping).
> > > 
> > > Err, no.  There should really be no exported cache manipulation macros,
> > > as drivers are almost guaranteed to get this wrong.  I've added
> > > Russell to the Cc list who has been extremtly vocal about this at least
> > > for arm.
> > 
> > We could possibly move this under some IOMMU core API (ie flush and
> > map, unmap and flush), the iommu APIs are non-modular so this could
> > avoid the exported symbol.
> 
> Though this would be pretty difficult for unmap as we don't have the
> pfns in the core code to flush. I don't think we have alot of good
> options but to make iommufd & VFIO handle this directly as they have
> the list of pages to flush on the unmap side. Use a namespace?
Given we'll rename this function to arch_flush_cache_phys() which takes physical
address as input, and there're already clflush_cache_range() and
arch_invalidate_pmem() exported with vaddr as input, is this export still good?
Christoph Hellwig May 28, 2024, 6:37 a.m. UTC | #7
On Tue, May 21, 2024 at 01:00:16PM -0300, Jason Gunthorpe wrote:
> > > Err, no.  There should really be no exported cache manipulation macros,
> > > as drivers are almost guaranteed to get this wrong.  I've added
> > > Russell to the Cc list who has been extremtly vocal about this at least
> > > for arm.
> > 
> > We could possibly move this under some IOMMU core API (ie flush and
> > map, unmap and flush), the iommu APIs are non-modular so this could
> > avoid the exported symbol.
> 
> Though this would be pretty difficult for unmap as we don't have the
> pfns in the core code to flush. I don't think we have alot of good
> options but to make iommufd & VFIO handle this directly as they have
> the list of pages to flush on the unmap side. Use a namespace?

Just have a unmap version that also takes a list of PFNs that you'd
need for non-coherent mappings?
Jason Gunthorpe June 1, 2024, 7:46 p.m. UTC | #8
On Mon, May 27, 2024 at 11:37:34PM -0700, Christoph Hellwig wrote:
> On Tue, May 21, 2024 at 01:00:16PM -0300, Jason Gunthorpe wrote:
> > > > Err, no.  There should really be no exported cache manipulation macros,
> > > > as drivers are almost guaranteed to get this wrong.  I've added
> > > > Russell to the Cc list who has been extremtly vocal about this at least
> > > > for arm.
> > > 
> > > We could possibly move this under some IOMMU core API (ie flush and
> > > map, unmap and flush), the iommu APIs are non-modular so this could
> > > avoid the exported symbol.
> > 
> > Though this would be pretty difficult for unmap as we don't have the
> > pfns in the core code to flush. I don't think we have alot of good
> > options but to make iommufd & VFIO handle this directly as they have
> > the list of pages to flush on the unmap side. Use a namespace?
> 
> Just have a unmap version that also takes a list of PFNs that you'd
> need for non-coherent mappings?

VFIO has never supported that so nothing like that exists yet.. This
is sort of the first steps to some very basic support for a
non-coherent cache flush in a limited case of a VM that can do its own
cache flushing through kvm.

The pfn list is needed for unpin_user_pages() and it has an ugly
design where vfio/iommufd read back the pfns seperately from unmap,
and they both do it differently without a common range list
datastructure here.

So, we'd need to build some new unmap function that returns a pfn list
that it internally fetches via the read ops. Then it can do the read,
unmap, flush iotlb, flush cache in core code.

I've been working towards this very slowly as I want to push this
stuff down into the io page table walk and remove the significant
inefficiency, so it is not throw away work, but it is certainly some
notable amount of work to do.

Jason
Yan Zhao June 6, 2024, 2:48 a.m. UTC | #9
On Sat, Jun 01, 2024 at 04:46:14PM -0300, Jason Gunthorpe wrote:
> On Mon, May 27, 2024 at 11:37:34PM -0700, Christoph Hellwig wrote:
> > On Tue, May 21, 2024 at 01:00:16PM -0300, Jason Gunthorpe wrote:
> > > > > Err, no.  There should really be no exported cache manipulation macros,
> > > > > as drivers are almost guaranteed to get this wrong.  I've added
> > > > > Russell to the Cc list who has been extremtly vocal about this at least
> > > > > for arm.
> > > > 
> > > > We could possibly move this under some IOMMU core API (ie flush and
> > > > map, unmap and flush), the iommu APIs are non-modular so this could
> > > > avoid the exported symbol.
> > > 
> > > Though this would be pretty difficult for unmap as we don't have the
> > > pfns in the core code to flush. I don't think we have alot of good
> > > options but to make iommufd & VFIO handle this directly as they have
> > > the list of pages to flush on the unmap side. Use a namespace?
> > 
> > Just have a unmap version that also takes a list of PFNs that you'd
> > need for non-coherent mappings?
> 
> VFIO has never supported that so nothing like that exists yet.. This
> is sort of the first steps to some very basic support for a
> non-coherent cache flush in a limited case of a VM that can do its own
> cache flushing through kvm.
> 
> The pfn list is needed for unpin_user_pages() and it has an ugly
> design where vfio/iommufd read back the pfns seperately from unmap,
> and they both do it differently without a common range list
> datastructure here.
> 
> So, we'd need to build some new unmap function that returns a pfn list
> that it internally fetches via the read ops. Then it can do the read,
> unmap, flush iotlb, flush cache in core code.
Would the core code flush CPU caches by providing page physical address?
If yes, do you think it's still necessary to export arch_flush_cache_phys()
(as what's implemented in this patch)?

> 
> I've been working towards this very slowly as I want to push this
> stuff down into the io page table walk and remove the significant
> inefficiency, so it is not throw away work, but it is certainly some
> notable amount of work to do.
Will VFIO also be switched to this new unmap interface? Do we need to care
about backporting?

And is it possible for VFIO alone to implement in the current proposed way
in this series as the first step for easier backport?
Jason Gunthorpe June 6, 2024, 11:55 a.m. UTC | #10
On Thu, Jun 06, 2024 at 10:48:10AM +0800, Yan Zhao wrote:
> On Sat, Jun 01, 2024 at 04:46:14PM -0300, Jason Gunthorpe wrote:
> > On Mon, May 27, 2024 at 11:37:34PM -0700, Christoph Hellwig wrote:
> > > On Tue, May 21, 2024 at 01:00:16PM -0300, Jason Gunthorpe wrote:
> > > > > > Err, no.  There should really be no exported cache manipulation macros,
> > > > > > as drivers are almost guaranteed to get this wrong.  I've added
> > > > > > Russell to the Cc list who has been extremtly vocal about this at least
> > > > > > for arm.
> > > > > 
> > > > > We could possibly move this under some IOMMU core API (ie flush and
> > > > > map, unmap and flush), the iommu APIs are non-modular so this could
> > > > > avoid the exported symbol.
> > > > 
> > > > Though this would be pretty difficult for unmap as we don't have the
> > > > pfns in the core code to flush. I don't think we have alot of good
> > > > options but to make iommufd & VFIO handle this directly as they have
> > > > the list of pages to flush on the unmap side. Use a namespace?
> > > 
> > > Just have a unmap version that also takes a list of PFNs that you'd
> > > need for non-coherent mappings?
> > 
> > VFIO has never supported that so nothing like that exists yet.. This
> > is sort of the first steps to some very basic support for a
> > non-coherent cache flush in a limited case of a VM that can do its own
> > cache flushing through kvm.
> > 
> > The pfn list is needed for unpin_user_pages() and it has an ugly
> > design where vfio/iommufd read back the pfns seperately from unmap,
> > and they both do it differently without a common range list
> > datastructure here.
> > 
> > So, we'd need to build some new unmap function that returns a pfn list
> > that it internally fetches via the read ops. Then it can do the read,
> > unmap, flush iotlb, flush cache in core code.
> Would the core code flush CPU caches by providing page physical address?

Physical address is all we will have in the core code..

> If yes, do you think it's still necessary to export arch_flush_cache_phys()
> (as what's implemented in this patch)?

Christoph is asking not to export it, that would mean relying on the
iommu core to be non-modulare and putting the arch calls there with a
more restricted exported API - ie based on unmap.

> > I've been working towards this very slowly as I want to push this
> > stuff down into the io page table walk and remove the significant
> > inefficiency, so it is not throw away work, but it is certainly some
> > notable amount of work to do.
> Will VFIO also be switched to this new unmap interface? Do we need to care
> about backporting?

I don't know :)
 
> And is it possible for VFIO alone to implement in the current proposed way
> in this series as the first step for easier backport?

I think this series is the best option we have right now, but make the
EXPORT a NS export to try to discourage abuse of it while we continue
working

Jason
Yan Zhao June 7, 2024, 9:39 a.m. UTC | #11
On Thu, Jun 06, 2024 at 08:55:03AM -0300, Jason Gunthorpe wrote:
> On Thu, Jun 06, 2024 at 10:48:10AM +0800, Yan Zhao wrote:
> > On Sat, Jun 01, 2024 at 04:46:14PM -0300, Jason Gunthorpe wrote:
> > > On Mon, May 27, 2024 at 11:37:34PM -0700, Christoph Hellwig wrote:
> > > > On Tue, May 21, 2024 at 01:00:16PM -0300, Jason Gunthorpe wrote:
> > > > > > > Err, no.  There should really be no exported cache manipulation macros,
> > > > > > > as drivers are almost guaranteed to get this wrong.  I've added
> > > > > > > Russell to the Cc list who has been extremtly vocal about this at least
> > > > > > > for arm.
> > > > > > 
> > > > > > We could possibly move this under some IOMMU core API (ie flush and
> > > > > > map, unmap and flush), the iommu APIs are non-modular so this could
> > > > > > avoid the exported symbol.
> > > > > 
> > > > > Though this would be pretty difficult for unmap as we don't have the
> > > > > pfns in the core code to flush. I don't think we have alot of good
> > > > > options but to make iommufd & VFIO handle this directly as they have
> > > > > the list of pages to flush on the unmap side. Use a namespace?
> > > > 
> > > > Just have a unmap version that also takes a list of PFNs that you'd
> > > > need for non-coherent mappings?
> > > 
> > > VFIO has never supported that so nothing like that exists yet.. This
> > > is sort of the first steps to some very basic support for a
> > > non-coherent cache flush in a limited case of a VM that can do its own
> > > cache flushing through kvm.
> > > 
> > > The pfn list is needed for unpin_user_pages() and it has an ugly
> > > design where vfio/iommufd read back the pfns seperately from unmap,
> > > and they both do it differently without a common range list
> > > datastructure here.
> > > 
> > > So, we'd need to build some new unmap function that returns a pfn list
> > > that it internally fetches via the read ops. Then it can do the read,
> > > unmap, flush iotlb, flush cache in core code.
> > Would the core code flush CPU caches by providing page physical address?
> 
> Physical address is all we will have in the core code..
> 
> > If yes, do you think it's still necessary to export arch_flush_cache_phys()
> > (as what's implemented in this patch)?
> 
> Christoph is asking not to export it, that would mean relying on the
> iommu core to be non-modulare and putting the arch calls there with a
> more restricted exported API - ie based on unmap.

Got it. Thanks for explanation!
> 
> > > I've been working towards this very slowly as I want to push this
> > > stuff down into the io page table walk and remove the significant
> > > inefficiency, so it is not throw away work, but it is certainly some
> > > notable amount of work to do.
> > Will VFIO also be switched to this new unmap interface? Do we need to care
> > about backporting?
> 
> I don't know :)
>  
> > And is it possible for VFIO alone to implement in the current proposed way
> > in this series as the first step for easier backport?
> 
> I think this series is the best option we have right now, but make the
> EXPORT a NS export to try to discourage abuse of it while we continue
> working
Will do. Thanks!
diff mbox series

Patch

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index b192d917a6d0..b63607994285 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -10,4 +10,7 @@ 
 
 void clflush_cache_range(void *addr, unsigned int size);
 
+void arch_clean_nonsnoop_dma(phys_addr_t phys, size_t length);
+#define arch_clean_nonsnoop_dma arch_clean_nonsnoop_dma
+
 #endif /* _ASM_X86_CACHEFLUSH_H */
diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
index 80c9037ffadf..7ff08ad20369 100644
--- a/arch/x86/mm/pat/set_memory.c
+++ b/arch/x86/mm/pat/set_memory.c
@@ -34,6 +34,7 @@ 
 #include <asm/memtype.h>
 #include <asm/hyperv-tlfs.h>
 #include <asm/mshyperv.h>
+#include <asm/mtrr.h>
 
 #include "../mm_internal.h"
 
@@ -349,6 +350,93 @@  void arch_invalidate_pmem(void *addr, size_t size)
 EXPORT_SYMBOL_GPL(arch_invalidate_pmem);
 #endif
 
+/*
+ * Flush pfn_valid() and !PageReserved() page
+ */
+static void clflush_page(struct page *page)
+{
+	const int size = boot_cpu_data.x86_clflush_size;
+	unsigned int i;
+	void *va;
+
+	va = kmap_local_page(page);
+
+	/* CLFLUSHOPT is unordered and requires full memory barrier */
+	mb();
+	for (i = 0; i < PAGE_SIZE; i += size)
+		clflushopt(va + i);
+	/* CLFLUSHOPT is unordered and requires full memory barrier */
+	mb();
+
+	kunmap_local(va);
+}
+
+/*
+ * Flush a reserved page or !pfn_valid() PFN.
+ * Flush is not performed if the PFN is accessed in uncacheable type. i.e.
+ * - PAT type is UC/UC-/WC when PAT is enabled
+ * - MTRR type is UC/WC/WT/WP when PAT is not enabled.
+ *   (no need to do CLFLUSH though WT/WP is cacheable).
+ */
+static void clflush_reserved_or_invalid_pfn(unsigned long pfn)
+{
+	const int size = boot_cpu_data.x86_clflush_size;
+	unsigned int i;
+	void *va;
+
+	if (!pat_enabled()) {
+		u64 start = PFN_PHYS(pfn), end = start + PAGE_SIZE;
+		u8 mtrr_type, uniform;
+
+		mtrr_type = mtrr_type_lookup(start, end, &uniform);
+		if (mtrr_type != MTRR_TYPE_WRBACK)
+			return;
+	} else if (pat_pfn_immune_to_uc_mtrr(pfn)) {
+		return;
+	}
+
+	va = memremap(pfn << PAGE_SHIFT, PAGE_SIZE, MEMREMAP_WB);
+	if (!va)
+		return;
+
+	/* CLFLUSHOPT is unordered and requires full memory barrier */
+	mb();
+	for (i = 0; i < PAGE_SIZE; i += size)
+		clflushopt(va + i);
+	/* CLFLUSHOPT is unordered and requires full memory barrier */
+	mb();
+
+	memunmap(va);
+}
+
+static inline void clflush_pfn(unsigned long pfn)
+{
+	if (pfn_valid(pfn) &&
+	    (!PageReserved(pfn_to_page(pfn)) || is_zero_pfn(pfn)))
+		return clflush_page(pfn_to_page(pfn));
+
+	clflush_reserved_or_invalid_pfn(pfn);
+}
+
+/**
+ * arch_clean_nonsnoop_dma - flush a cache range for non-coherent DMAs
+ *                           (DMAs that lack CPU cache snooping).
+ * @phys_addr:	physical address start
+ * @length:	number of bytes to flush
+ */
+void arch_clean_nonsnoop_dma(phys_addr_t phys_addr, size_t length)
+{
+	unsigned long nrpages, pfn;
+	unsigned long i;
+
+	pfn = PHYS_PFN(phys_addr);
+	nrpages = PAGE_ALIGN((phys_addr & ~PAGE_MASK) + length) >> PAGE_SHIFT;
+
+	for (i = 0; i < nrpages; i++, pfn++)
+		clflush_pfn(pfn);
+}
+EXPORT_SYMBOL_GPL(arch_clean_nonsnoop_dma);
+
 #ifdef CONFIG_ARCH_HAS_CPU_CACHE_INVALIDATE_MEMREGION
 bool cpu_cache_has_invalidate_memregion(void)
 {
diff --git a/include/linux/cacheflush.h b/include/linux/cacheflush.h
index 55f297b2c23f..0bfc6551c6d3 100644
--- a/include/linux/cacheflush.h
+++ b/include/linux/cacheflush.h
@@ -26,4 +26,10 @@  static inline void flush_icache_pages(struct vm_area_struct *vma,
 
 #define flush_icache_page(vma, page)	flush_icache_pages(vma, page, 1)
 
+#ifndef arch_clean_nonsnoop_dma
+static inline void arch_clean_nonsnoop_dma(phys_addr_t phys, size_t length)
+{
+}
+#endif
+
 #endif /* _LINUX_CACHEFLUSH_H */