diff mbox series

[v2,1/1] swiotlb: Reduce swiotlb pool lookups

Message ID 20240701165746.1358-1-mhklinux@outlook.com (mailing list archive)
State Superseded
Headers show
Series [v2,1/1] swiotlb: Reduce swiotlb pool lookups | expand

Commit Message

Michael Kelley July 1, 2024, 4:57 p.m. UTC
From: Michael Kelley <mhklinux@outlook.com>

With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
places, the pool is found and used in one function, and then must
found again in the next function that is called because only the
tlb_addr is passed as an argument. These are the six call sites:

dma_direct_map_page:
1. swiotlb_map->swiotlb_tbl_map_single->swiotlb_bounce

dma_direct_unmap_page:
2. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
3. dma_direct_sync_single_for_cpu->swiotlb_sync_single_for_cpu->
	swiotlb_bounce
4. is_swiotlb_buffer
5. swiotlb_tbl_unmap_single->swiotlb_del_transient
6. swiotlb_tbl_unmap_single->swiotlb_release_slots

Reduce the number of calls by finding the pool at a higher level, and
passing it as an argument instead of searching again. A key change is
for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
and then pass this pool pointer to subsequent swiotlb functions.

Since is_swiotlb_buffer() no longer returns a boolean, rename some
functions to reflect the change:
* swiotlb_find_pool() becomes __swiotlb_find_pool()
* is_swiotlb_buffer() becomes swiotlb_find_pool()
* is_xen_swiotlb_buffer() becomes xen_swiotlb_find_pool()

With these changes, a round-trip map/unmap pair requires only 2 pool
lookups (listed using the new names):

dma_direct_unmap_page:
1. dma_direct_sync_single_for_cpu->swiotlb_find_pool
2. swiotlb_find_pool

These changes come from noticing the inefficiencies in a code review,
not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
__swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
so avoiding the redundant calls helps performance in a hot path.
When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
is minimal and the perf benefits are likely negligible, but no
harm is done.

No functional change is intended.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
Changes in v2 vs. RFC version[1]:
* In swiotlb_find_pool(), use __swiotlb_find_pool() instead of open
  coding when CONFIG_SWIOTLB_DYNAMIC is not set [Petr Tesařík]
* Rename functions as described in the commit message to reflect that
  is_swiotlb_buffer() no longer returns a boolean [Petr Tesařík,
  Christoph Hellwig]
* Updated commit message and patch Subject

[1] https://lore.kernel.org/linux-iommu/20240607031421.182589-1-mhklinux@outlook.com/

 drivers/iommu/dma-iommu.c   | 27 ++++++++++-----
 drivers/xen/swiotlb-xen.c   | 27 +++++++++------
 include/linux/scatterlist.h |  2 +-
 include/linux/swiotlb.h     | 68 ++++++++++++++++++++-----------------
 kernel/dma/direct.c         | 14 +++++---
 kernel/dma/direct.h         | 18 ++++++----
 kernel/dma/swiotlb.c        | 53 +++++++++++++++--------------
 7 files changed, 122 insertions(+), 87 deletions(-)

Comments

Christoph Hellwig July 6, 2024, 5:50 a.m. UTC | #1
Hi Michael,

I like the idea behind this, but can you respin it to avoid some of
the added code duplication.  We have a lot of this pattern:

	pool = swiotlb_find_pool(dev, paddr);
	if (pool)
		swiotlb_foo(dev, ...

duplicated in all three swiotlb users.  If we rename the original
swiotlb_foo to __swiotlb_foo and add a little inline wrapper this is
de-duplicated and also avoids exposing swiotlb_find_pool to the
callers.

If we then stub out swiotlb_find_pool to return NULL for !CONFIG_SWIOTLB,
we also don't need extra stubs for all the __swiotlb_ helpers as the
compiler will eliminate the calls as dead code.

I might be missing something, but what is the reason for using the
lower-level __swiotlb_find_pool in swiotlb_map and xen_swiotlb_map_page?
I can't see a reason why the simple checks in swiotlb_find_pool itself
are either wrong or a performance problem there.  Because if we don't
need these separate calls we can do away with __swiotlb_find_pool
for !CONFIG_SWIOTLB_DYNAMIC and simplify swiotlb_find_pool quite
a bit like this:

	...

	if (!mem)
		return NULL;

	if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {
		smp_rmb(); 
		if (!READ_ONCE(dev->dma_uses_io_tlb))
			return NULL;
		return __swiotlb_find_pool(dev, paddr);
	}

	if (paddr < mem->defpool.start || paddr >= mem->defpool.end)
		return NULL;
	return &dev->dma_io_tlb_mem->defpool;


While you're at it please fix the > 80 character lines as this patch
is adding plenty.
Michael Kelley July 7, 2024, 2:11 a.m. UTC | #2
From: Christoph Hellwig <hch@lst.de> Sent: Friday, July 5, 2024 10:50 PM
> 
> Hi Michael,
> 
> I like the idea behind this, but can you respin it to avoid some of
> the added code duplication.  We have a lot of this pattern:
> 
> 	pool = swiotlb_find_pool(dev, paddr);
> 	if (pool)
> 		swiotlb_foo(dev, ...
> 
> duplicated in all three swiotlb users.  If we rename the original
> swiotlb_foo to __swiotlb_foo and add a little inline wrapper this is
> de-duplicated and also avoids exposing swiotlb_find_pool to the
> callers.

This works pretty well. It certainly avoids the messiness of declaring
a "pool" local variable and needing a separate assignment before the
"if" statement, in each of the 9 call sites. The small downside is that
it looks like a swiotlb function is called every time, even though
there's usually an inline bailout. But that pattern occurs throughout
the kernel, so not a big deal.

I initially coded this change as a separate patch that goes first. But
the second patch ends up changing about 20 lines that are changed
by the first patch. It's hard to cleanly tease them apart. So I've gone
back to a single unified patch. But let me know if you think it's worth
the extra churn to break them apart.

> 
> If we then stub out swiotlb_find_pool to return NULL for !CONFIG_SWIOTLB,
> we also don't need extra stubs for all the __swiotlb_ helpers as the
> compiler will eliminate the calls as dead code.

Yes, this works as long as the declarations for the __swiotlb_foo
functions are *not* under CONFIG_SWIOTLB. But when compiling with
!CONFIG_SWIOTLB on arm64 with gcc-8.5.0, two tangentially related
compile errors occur. iommu_dma_map_page() references
swiotlb_tlb_map_single(). The declaration for the latter is under
CONFIG_SWIOTLB. A similar problem occurs with dma_direct_map_page()
and swiotlb_map(). Do later versions of gcc not complain when the
reference is in dead code? Or are these just bugs that occurred because
!CONFIG_SWIOTLB is rare? If the latter, I can submit a separate patch to
move the declarations out from under CONFIG_SWIOTLB.

> 
> I might be missing something, but what is the reason for using the
> lower-level __swiotlb_find_pool in swiotlb_map and xen_swiotlb_map_page?
> I can't see a reason why the simple checks in swiotlb_find_pool itself
> are either wrong or a performance problem there.  

Yes, swiotlb_find_pool() could be used instead of __swiotlb_find_pool().

> Because if we don't
> need these separate calls we can do away with __swiotlb_find_pool
> for !CONFIG_SWIOTLB_DYNAMIC and simplify swiotlb_find_pool quite
> a bit like this:
> 
> 	...
> 
> 	if (!mem)
> 		return NULL;
> 
> 	if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {

The "IS_ENABLED" doesn't work because the dma_uses_io_tlb
field in struct dev is under CONFIG_SWIOTLB_DYNAMIC. I guess
it could be moved out, but that's going further afield. So I'm back
to using #ifdef.

> 		smp_rmb();
> 		if (!READ_ONCE(dev->dma_uses_io_tlb))
> 			return NULL;
> 		return __swiotlb_find_pool(dev, paddr);
> 	}
> 
> 	if (paddr < mem->defpool.start || paddr >= mem->defpool.end)
> 		return NULL;
> 	return &dev->dma_io_tlb_mem->defpool;

Petr Tesařík had commented [1] on my original RFC suggesting that
__swiotlb_find_pool() be used here instead of open coding it. With
the changes you suggest, __swiotlb_find_pool() is needed only in
the CONFIG_SWIOTLB_DYNAMIC case, and I would be fine with just
open coding the address of defpool here. Petr -- are you OK with
removing __swiotlb_find_pool when !CONFIG_SWIOTLB_DYNAMIC,
since this is the only place it would be used?

> 
> 
> While you're at it please fix the > 80 character lines as this patch
> is adding plenty.

Many cases already go away with your first suggestion above,
but I'll fix the others.

Michael

[1] https://lore.kernel.org/linux-iommu/20240627092049.1dbec746@meshulam.tesarici.cz/
Michael Kelley July 7, 2024, 2:35 a.m. UTC | #3
From: Michael Kelley Sent: Saturday, July 6, 2024 7:12 PM

[ ... ]
> >
> > If we then stub out swiotlb_find_pool to return NULL for !CONFIG_SWIOTLB,
> > we also don't need extra stubs for all the __swiotlb_ helpers as the
> > compiler will eliminate the calls as dead code.
> 
> Yes, this works as long as the declarations for the __swiotlb_foo
> functions are *not* under CONFIG_SWIOTLB. But when compiling with
> !CONFIG_SWIOTLB on arm64 with gcc-8.5.0, two tangentially related
> compile errors occur. iommu_dma_map_page() references
> swiotlb_tlb_map_single(). The declaration for the latter is under
> CONFIG_SWIOTLB. A similar problem occurs with dma_direct_map_page()
> and swiotlb_map(). Do later versions of gcc not complain when the
> reference is in dead code? Or are these just bugs that occurred because
> !CONFIG_SWIOTLB is rare? If the latter, I can submit a separate patch to
> move the declarations out from under CONFIG_SWIOTLB.
> 

Ignore the "two tangentially related compile errors".  I schlepped some
code around incorrectly and caused the problem myself. :-(

Michael
Christoph Hellwig July 7, 2024, 6:36 a.m. UTC | #4
On Sun, Jul 07, 2024 at 02:11:48AM +0000, Michael Kelley wrote:
> This works pretty well. It certainly avoids the messiness of declaring
> a "pool" local variable and needing a separate assignment before the
> "if" statement, in each of the 9 call sites. The small downside is that
> it looks like a swiotlb function is called every time, even though
> there's usually an inline bailout. But that pattern occurs throughout
> the kernel, so not a big deal.
> 
> I initially coded this change as a separate patch that goes first. But
> the second patch ends up changing about 20 lines that are changed
> by the first patch. It's hard to cleanly tease them apart. So I've gone
> back to a single unified patch. But let me know if you think it's worth
> the extra churn to break them apart.

I think it's perfectly fine to keep one big patch.

> Yes, this works as long as the declarations for the __swiotlb_foo
> functions are *not* under CONFIG_SWIOTLB. But when compiling with
> !CONFIG_SWIOTLB on arm64 with gcc-8.5.0, two tangentially related
> compile errors occur. iommu_dma_map_page() references
> swiotlb_tlb_map_single(). The declaration for the latter is under
> CONFIG_SWIOTLB. A similar problem occurs with dma_direct_map_page()
> and swiotlb_map(). Do later versions of gcc not complain when the
> reference is in dead code? Or are these just bugs that occurred because
> !CONFIG_SWIOTLB is rare? If the latter, I can submit a separate patch to
> move the declarations out from under CONFIG_SWIOTLB.

A reference to dead code is fine as long as the condition is a compile
time one.  I think your next mail sugests you've sorted this out, but
if not let me know or just shared your current work in progress patch.

> > 	if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {
> 
> The "IS_ENABLED" doesn't work because the dma_uses_io_tlb
> field in struct dev is under CONFIG_SWIOTLB_DYNAMIC. I guess
> it could be moved out, but that's going further afield. So I'm back
> to using #ifdef.

Yes, we'd need the field definition.

> Petr Tesařík had commented [1] on my original RFC suggesting that
> __swiotlb_find_pool() be used here instead of open coding it. With
> the changes you suggest, __swiotlb_find_pool() is needed only in
> the CONFIG_SWIOTLB_DYNAMIC case, and I would be fine with just
> open coding the address of defpool here. Petr -- are you OK with
> removing __swiotlb_find_pool when !CONFIG_SWIOTLB_DYNAMIC,
> since this is the only place it would be used?

Yes.  That was indeed the intent behind the suggstion.
Petr Tesařík July 8, 2024, 4:28 a.m. UTC | #5
On Sun, 7 Jul 2024 02:11:48 +0000
Michael Kelley <mhklinux@outlook.com> wrote:

> From: Christoph Hellwig <hch@lst.de> Sent: Friday, July 5, 2024 10:50 PM
> > 
> > Hi Michael,
> > 
> > I like the idea behind this, but can you respin it to avoid some of
> > the added code duplication.  We have a lot of this pattern:
> > 
> > 	pool = swiotlb_find_pool(dev, paddr);
> > 	if (pool)
> > 		swiotlb_foo(dev, ...
> > 
> > duplicated in all three swiotlb users.  If we rename the original
> > swiotlb_foo to __swiotlb_foo and add a little inline wrapper this is
> > de-duplicated and also avoids exposing swiotlb_find_pool to the
> > callers.  
> 
> This works pretty well. It certainly avoids the messiness of declaring
> a "pool" local variable and needing a separate assignment before the
> "if" statement, in each of the 9 call sites. The small downside is that
> it looks like a swiotlb function is called every time, even though
> there's usually an inline bailout. But that pattern occurs throughout
> the kernel, so not a big deal.
> 
> I initially coded this change as a separate patch that goes first. But
> the second patch ends up changing about 20 lines that are changed
> by the first patch. It's hard to cleanly tease them apart. So I've gone
> back to a single unified patch. But let me know if you think it's worth
> the extra churn to break them apart.
> 
> > 
> > If we then stub out swiotlb_find_pool to return NULL for !CONFIG_SWIOTLB,
> > we also don't need extra stubs for all the __swiotlb_ helpers as the
> > compiler will eliminate the calls as dead code.  
> 
> Yes, this works as long as the declarations for the __swiotlb_foo
> functions are *not* under CONFIG_SWIOTLB. But when compiling with
> !CONFIG_SWIOTLB on arm64 with gcc-8.5.0, two tangentially related
> compile errors occur. iommu_dma_map_page() references
> swiotlb_tlb_map_single(). The declaration for the latter is under
> CONFIG_SWIOTLB. A similar problem occurs with dma_direct_map_page()
> and swiotlb_map(). Do later versions of gcc not complain when the
> reference is in dead code? Or are these just bugs that occurred because
> !CONFIG_SWIOTLB is rare? If the latter, I can submit a separate patch to
> move the declarations out from under CONFIG_SWIOTLB.
> 
> > 
> > I might be missing something, but what is the reason for using the
> > lower-level __swiotlb_find_pool in swiotlb_map and xen_swiotlb_map_page?
> > I can't see a reason why the simple checks in swiotlb_find_pool itself
> > are either wrong or a performance problem there.    
> 
> Yes, swiotlb_find_pool() could be used instead of __swiotlb_find_pool().
> 
> > Because if we don't
> > need these separate calls we can do away with __swiotlb_find_pool
> > for !CONFIG_SWIOTLB_DYNAMIC and simplify swiotlb_find_pool quite
> > a bit like this:
> > 
> > 	...
> > 
> > 	if (!mem)
> > 		return NULL;
> > 
> > 	if (IS_ENABLED(CONFIG_SWIOTLB_DYNAMIC)) {  
> 
> The "IS_ENABLED" doesn't work because the dma_uses_io_tlb
> field in struct dev is under CONFIG_SWIOTLB_DYNAMIC. I guess
> it could be moved out, but that's going further afield. So I'm back
> to using #ifdef.
> 
> > 		smp_rmb();
> > 		if (!READ_ONCE(dev->dma_uses_io_tlb))
> > 			return NULL;
> > 		return __swiotlb_find_pool(dev, paddr);
> > 	}
> > 
> > 	if (paddr < mem->defpool.start || paddr >= mem->defpool.end)
> > 		return NULL;
> > 	return &dev->dma_io_tlb_mem->defpool;  
> 
> Petr Tesařík had commented [1] on my original RFC suggesting that
> __swiotlb_find_pool() be used here instead of open coding it. With
> the changes you suggest, __swiotlb_find_pool() is needed only in
> the CONFIG_SWIOTLB_DYNAMIC case, and I would be fine with just
> open coding the address of defpool here. Petr -- are you OK with
> removing __swiotlb_find_pool when !CONFIG_SWIOTLB_DYNAMIC,
> since this is the only place it would be used?

Yes. I have never had strong opinion about it, I merely saw the
opportunity when it was low-hanging fruit, but it's definitely not
worth adding complexity.

Petr T
diff mbox series

Patch

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 43520e7275cc..0e2a9b043ca6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1073,6 +1073,7 @@  static void iommu_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t phys;
+	struct io_tlb_pool *pool;
 
 	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
@@ -1081,21 +1082,25 @@  static void iommu_dma_sync_single_for_cpu(struct device *dev,
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_cpu(phys, size, dir);
 
-	if (is_swiotlb_buffer(dev, phys))
-		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
+	pool = swiotlb_find_pool(dev, phys);
+	if (pool)
+		swiotlb_sync_single_for_cpu(dev, phys, size, dir, pool);
 }
 
 static void iommu_dma_sync_single_for_device(struct device *dev,
 		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t phys;
+	struct io_tlb_pool *pool;
 
 	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-	if (is_swiotlb_buffer(dev, phys))
-		swiotlb_sync_single_for_device(dev, phys, size, dir);
+
+	pool = swiotlb_find_pool(dev, phys);
+	if (pool)
+		swiotlb_sync_single_for_device(dev, phys, size, dir, pool);
 
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_device(phys, size, dir);
@@ -1189,8 +1194,12 @@  static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		arch_sync_dma_for_device(phys, size, dir);
 
 	iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
-	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+	if (iova == DMA_MAPPING_ERROR) {
+		struct io_tlb_pool *pool = swiotlb_find_pool(dev, phys);
+
+		if (pool)
+			swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, pool);
+	}
 	return iova;
 }
 
@@ -1199,6 +1208,7 @@  static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
 {
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	phys_addr_t phys;
+	struct io_tlb_pool *pool;
 
 	phys = iommu_iova_to_phys(domain, dma_handle);
 	if (WARN_ON(!phys))
@@ -1209,8 +1219,9 @@  static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
 
 	__iommu_dma_unmap(dev, dma_handle, size);
 
-	if (unlikely(is_swiotlb_buffer(dev, phys)))
-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+	pool = swiotlb_find_pool(dev, phys);
+	if (unlikely(pool))
+		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, pool);
 }
 
 /*
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 6579ae3f6dac..1696941634f1 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -88,7 +88,7 @@  static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
 	return 0;
 }
 
-static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
+static struct io_tlb_pool *xen_swiotlb_find_pool(struct device *dev, dma_addr_t dma_addr)
 {
 	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
 	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
@@ -99,8 +99,8 @@  static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
 	 * in our domain. Therefore _only_ check address within our domain.
 	 */
 	if (pfn_valid(PFN_DOWN(paddr)))
-		return is_swiotlb_buffer(dev, paddr);
-	return 0;
+		return swiotlb_find_pool(dev, paddr);
+	return NULL;
 }
 
 #ifdef CONFIG_X86
@@ -228,7 +228,8 @@  static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 	 */
 	if (unlikely(!dma_capable(dev, dev_addr, size, true))) {
 		swiotlb_tbl_unmap_single(dev, map, size, dir,
-				attrs | DMA_ATTR_SKIP_CPU_SYNC);
+				attrs | DMA_ATTR_SKIP_CPU_SYNC,
+				__swiotlb_find_pool(dev, map));
 		return DMA_MAPPING_ERROR;
 	}
 
@@ -254,6 +255,7 @@  static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	phys_addr_t paddr = xen_dma_to_phys(hwdev, dev_addr);
+	struct io_tlb_pool *pool;
 
 	BUG_ON(dir == DMA_NONE);
 
@@ -265,8 +267,9 @@  static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 	}
 
 	/* NOTE: We use dev_addr here, not paddr! */
-	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
-		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
+	pool = xen_swiotlb_find_pool(hwdev, dev_addr);
+	if (pool)
+		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs, pool);
 }
 
 static void
@@ -274,6 +277,7 @@  xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 		size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
+	struct io_tlb_pool *pool;
 
 	if (!dev_is_dma_coherent(dev)) {
 		if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr))))
@@ -282,8 +286,9 @@  xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
 	}
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
-		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
+	pool = xen_swiotlb_find_pool(dev, dma_addr);
+	if (pool)
+		swiotlb_sync_single_for_cpu(dev, paddr, size, dir, pool);
 }
 
 static void
@@ -291,9 +296,11 @@  xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
 		size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
+	struct io_tlb_pool *pool;
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
-		swiotlb_sync_single_for_device(dev, paddr, size, dir);
+	pool = xen_swiotlb_find_pool(dev, dma_addr);
+	if (pool)
+		swiotlb_sync_single_for_device(dev, paddr, size, dir, pool);
 
 	if (!dev_is_dma_coherent(dev)) {
 		if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr))))
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index 77df3d7b18a6..e61d164622db 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -332,7 +332,7 @@  static inline void sg_dma_unmark_bus_address(struct scatterlist *sg)
  * Description:
  *   Returns true if the scatterlist was marked for SWIOTLB bouncing. Not all
  *   elements may have been bounced, so the caller would have to check
- *   individual SG entries with is_swiotlb_buffer().
+ *   individual SG entries with swiotlb_find_pool().
  */
 static inline bool sg_dma_is_swiotlb(struct scatterlist *sg)
 {
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 14bc10c1bb23..486882fe3752 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -42,24 +42,6 @@  int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 	int (*remap)(void *tlb, unsigned long nslabs));
 extern void __init swiotlb_update_mem_attributes(void);
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
-		size_t mapping_size,
-		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
-		unsigned long attrs);
-
-extern void swiotlb_tbl_unmap_single(struct device *hwdev,
-				     phys_addr_t tlb_addr,
-				     size_t mapping_size,
-				     enum dma_data_direction dir,
-				     unsigned long attrs);
-
-void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
-		size_t size, enum dma_data_direction dir);
-void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
-		size_t size, enum dma_data_direction dir);
-dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
-
 #ifdef CONFIG_SWIOTLB
 
 /**
@@ -145,11 +127,11 @@  struct io_tlb_mem {
 
 #ifdef CONFIG_SWIOTLB_DYNAMIC
 
-struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr);
+struct io_tlb_pool *__swiotlb_find_pool(struct device *dev, phys_addr_t paddr);
 
 #else
 
-static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
+static inline struct io_tlb_pool *__swiotlb_find_pool(struct device *dev,
 						    phys_addr_t paddr)
 {
 	return &dev->dma_io_tlb_mem->defpool;
@@ -158,22 +140,23 @@  static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
 #endif
 
 /**
- * is_swiotlb_buffer() - check if a physical address belongs to a swiotlb
+ * swiotlb_find_pool() - find the swiotlb pool to which a physical address belongs
  * @dev:        Device which has mapped the buffer.
  * @paddr:      Physical address within the DMA buffer.
  *
- * Check if @paddr points into a bounce buffer.
+ * Find the swiotlb pool that @paddr points into.
  *
  * Return:
- * * %true if @paddr points into a bounce buffer
- * * %false otherwise
+ * * pool address if @paddr points into a bounce buffer
+ * * NULL if @paddr does not point into a bounce buffer. As such, this function
+ *   can be used to determine if @paddr denotes a swiotlb bounce buffer.
  */
-static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
+static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr)
 {
 	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
 	if (!mem)
-		return false;
+		return NULL;
 
 #ifdef CONFIG_SWIOTLB_DYNAMIC
 	/*
@@ -182,16 +165,18 @@  static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 	 * If a SWIOTLB address is checked on another CPU, then it was
 	 * presumably loaded by the device driver from an unspecified private
 	 * data structure. Make sure that this load is ordered before reading
-	 * dev->dma_uses_io_tlb here and mem->pools in swiotlb_find_pool().
+	 * dev->dma_uses_io_tlb here and mem->pools in __swiotlb_find_pool().
 	 *
 	 * This barrier pairs with smp_mb() in swiotlb_find_slots().
 	 */
 	smp_rmb();
-	return READ_ONCE(dev->dma_uses_io_tlb) &&
-		swiotlb_find_pool(dev, paddr);
+	if (!READ_ONCE(dev->dma_uses_io_tlb))
+		return NULL;
 #else
-	return paddr >= mem->defpool.start && paddr < mem->defpool.end;
+	if (paddr < mem->defpool.start || paddr >= mem->defpool.end)
+		return NULL;
 #endif
+	return __swiotlb_find_pool(dev, paddr);
 }
 
 static inline bool is_swiotlb_force_bounce(struct device *dev)
@@ -201,6 +186,25 @@  static inline bool is_swiotlb_force_bounce(struct device *dev)
 	return mem && mem->force_bounce;
 }
 
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
+		size_t mapping_size,
+		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
+		unsigned long attrs);
+
+extern void swiotlb_tbl_unmap_single(struct device *hwdev,
+				     phys_addr_t tlb_addr,
+				     size_t mapping_size,
+				     enum dma_data_direction dir,
+				     unsigned long attrs,
+				     struct io_tlb_pool *pool);
+
+void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
+		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool);
+void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
+		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool);
+dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
+		size_t size, enum dma_data_direction dir, unsigned long attrs);
+
 void swiotlb_init(bool addressing_limited, unsigned int flags);
 void __init swiotlb_exit(void);
 void swiotlb_dev_init(struct device *dev);
@@ -219,9 +223,9 @@  static inline void swiotlb_dev_init(struct device *dev)
 {
 }
 
-static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
+static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr)
 {
-	return false;
+	return NULL;
 }
 static inline bool is_swiotlb_force_bounce(struct device *dev)
 {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4d543b1e9d57..edb126179390 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -399,14 +399,16 @@  void dma_direct_sync_sg_for_device(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *sg;
+	struct io_tlb_pool *pool;
 	int i;
 
 	for_each_sg(sgl, sg, nents, i) {
 		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-		if (unlikely(is_swiotlb_buffer(dev, paddr)))
+		pool = swiotlb_find_pool(dev, paddr);
+		if (unlikely(pool))
 			swiotlb_sync_single_for_device(dev, paddr, sg->length,
-						       dir);
+						       dir, pool);
 
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_device(paddr, sg->length,
@@ -422,6 +424,7 @@  void dma_direct_sync_sg_for_cpu(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *sg;
+	struct io_tlb_pool *pool;
 	int i;
 
 	for_each_sg(sgl, sg, nents, i) {
@@ -430,9 +433,10 @@  void dma_direct_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(paddr, sg->length, dir);
 
-		if (unlikely(is_swiotlb_buffer(dev, paddr)))
+		pool = swiotlb_find_pool(dev, paddr);
+		if (unlikely(pool))
 			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
-						    dir);
+						    dir, pool);
 
 		if (dir == DMA_FROM_DEVICE)
 			arch_dma_mark_clean(paddr, sg->length);
@@ -640,7 +644,7 @@  size_t dma_direct_max_mapping_size(struct device *dev)
 bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
 	return !dev_is_dma_coherent(dev) ||
-	       is_swiotlb_buffer(dev, dma_to_phys(dev, dma_addr));
+	       swiotlb_find_pool(dev, dma_to_phys(dev, dma_addr));
 }
 
 /**
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 18d346118fe8..6067120d7a42 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -57,9 +57,11 @@  static inline void dma_direct_sync_single_for_device(struct device *dev,
 		dma_addr_t addr, size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = dma_to_phys(dev, addr);
+	struct io_tlb_pool *pool;
 
-	if (unlikely(is_swiotlb_buffer(dev, paddr)))
-		swiotlb_sync_single_for_device(dev, paddr, size, dir);
+	pool = swiotlb_find_pool(dev, paddr);
+	if (unlikely(pool))
+		swiotlb_sync_single_for_device(dev, paddr, size, dir, pool);
 
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_device(paddr, size, dir);
@@ -69,14 +71,16 @@  static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		dma_addr_t addr, size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = dma_to_phys(dev, addr);
+	struct io_tlb_pool *pool;
 
 	if (!dev_is_dma_coherent(dev)) {
 		arch_sync_dma_for_cpu(paddr, size, dir);
 		arch_sync_dma_for_cpu_all();
 	}
 
-	if (unlikely(is_swiotlb_buffer(dev, paddr)))
-		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
+	pool = swiotlb_find_pool(dev, paddr);
+	if (unlikely(pool))
+		swiotlb_sync_single_for_cpu(dev, paddr, size, dir, pool);
 
 	if (dir == DMA_FROM_DEVICE)
 		arch_dma_mark_clean(paddr, size);
@@ -117,12 +121,14 @@  static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	phys_addr_t phys = dma_to_phys(dev, addr);
+	struct io_tlb_pool *pool;
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
-	if (unlikely(is_swiotlb_buffer(dev, phys)))
+	pool = swiotlb_find_pool(dev, phys);
+	if (unlikely(pool))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir,
-					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
+					 attrs | DMA_ATTR_SKIP_CPU_SYNC, pool);
 }
 #endif /* _KERNEL_DMA_DIRECT_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index fe1ccb53596f..9e58ceca3d0b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -763,16 +763,18 @@  static void swiotlb_dyn_free(struct rcu_head *rcu)
 }
 
 /**
- * swiotlb_find_pool() - find the IO TLB pool for a physical address
+ * __swiotlb_find_pool() - find the IO TLB pool for a physical address
  * @dev:        Device which has mapped the DMA buffer.
  * @paddr:      Physical address within the DMA buffer.
  *
  * Find the IO TLB memory pool descriptor which contains the given physical
- * address, if any.
+ * address, if any. This function is for use only when the dev is known to
+ * be using swiotlb. Use swiotlb_find_pool() for the more general case
+ * when this condition is not met.
  *
  * Return: Memory pool which contains @paddr, or %NULL if none.
  */
-struct io_tlb_pool *swiotlb_find_pool(struct device *dev, phys_addr_t paddr)
+struct io_tlb_pool *__swiotlb_find_pool(struct device *dev, phys_addr_t paddr)
 {
 	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 	struct io_tlb_pool *pool;
@@ -855,9 +857,8 @@  static unsigned int swiotlb_align_offset(struct device *dev,
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size,
-			   enum dma_data_direction dir)
+			   enum dma_data_direction dir, struct io_tlb_pool *mem)
 {
-	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
 	int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = mem->slots[index].orig_addr;
 	size_t alloc_size = mem->slots[index].alloc_size;
@@ -1243,7 +1244,7 @@  static int swiotlb_find_slots(struct device *dev, phys_addr_t orig_addr,
 	 * that was made by swiotlb_dyn_alloc() on a third CPU (cf. multicopy
 	 * atomicity).
 	 *
-	 * See also the comment in is_swiotlb_buffer().
+	 * See also the comment in swiotlb_find_pool().
 	 */
 	smp_mb();
 
@@ -1435,13 +1436,13 @@  phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	 * hardware behavior.  Use of swiotlb is supposed to be transparent,
 	 * i.e. swiotlb must not corrupt memory by clobbering unwritten bytes.
 	 */
-	swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
+	swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE, pool);
 	return tlb_addr;
 }
 
-static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr,
+				  struct io_tlb_pool *mem)
 {
-	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
 	unsigned long flags;
 	unsigned int offset = swiotlb_align_offset(dev, 0, tlb_addr);
 	int index, nslots, aindex;
@@ -1505,11 +1506,9 @@  static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
  *
  * Return: %true if @tlb_addr belonged to a transient pool that was released.
  */
-static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr)
+static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr,
+				  struct io_tlb_pool *pool)
 {
-	struct io_tlb_pool *pool;
-
-	pool = swiotlb_find_pool(dev, tlb_addr);
 	if (!pool->transient)
 		return false;
 
@@ -1522,7 +1521,8 @@  static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr)
 #else  /* !CONFIG_SWIOTLB_DYNAMIC */
 
 static inline bool swiotlb_del_transient(struct device *dev,
-					 phys_addr_t tlb_addr)
+					 phys_addr_t tlb_addr,
+					 struct io_tlb_pool *pool)
 {
 	return false;
 }
@@ -1534,34 +1534,34 @@  static inline bool swiotlb_del_transient(struct device *dev,
  */
 void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
 			      size_t mapping_size, enum dma_data_direction dir,
-			      unsigned long attrs)
+			      unsigned long attrs, struct io_tlb_pool *pool)
 {
 	/*
 	 * First, sync the memory before unmapping the entry
 	 */
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-		swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+		swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE, pool);
 
-	if (swiotlb_del_transient(dev, tlb_addr))
+	if (swiotlb_del_transient(dev, tlb_addr, pool))
 		return;
-	swiotlb_release_slots(dev, tlb_addr);
+	swiotlb_release_slots(dev, tlb_addr, pool);
 }
 
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
-		size_t size, enum dma_data_direction dir)
+		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool)
 {
 	if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
-		swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
+		swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE, pool);
 	else
 		BUG_ON(dir != DMA_FROM_DEVICE);
 }
 
 void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
-		size_t size, enum dma_data_direction dir)
+		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool)
 {
 	if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
-		swiotlb_bounce(dev, tlb_addr, size, DMA_FROM_DEVICE);
+		swiotlb_bounce(dev, tlb_addr, size, DMA_FROM_DEVICE, pool);
 	else
 		BUG_ON(dir != DMA_TO_DEVICE);
 }
@@ -1586,7 +1586,8 @@  dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
 	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
 	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
 		swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
-			attrs | DMA_ATTR_SKIP_CPU_SYNC);
+			attrs | DMA_ATTR_SKIP_CPU_SYNC,
+			__swiotlb_find_pool(dev, swiotlb_addr));
 		dev_WARN_ONCE(dev, 1,
 			"swiotlb addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
 			&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
@@ -1774,11 +1775,13 @@  struct page *swiotlb_alloc(struct device *dev, size_t size)
 bool swiotlb_free(struct device *dev, struct page *page, size_t size)
 {
 	phys_addr_t tlb_addr = page_to_phys(page);
+	struct io_tlb_pool *pool;
 
-	if (!is_swiotlb_buffer(dev, tlb_addr))
+	pool = swiotlb_find_pool(dev, tlb_addr);
+	if (!pool)
 		return false;
 
-	swiotlb_release_slots(dev, tlb_addr);
+	swiotlb_release_slots(dev, tlb_addr, pool);
 
 	return true;
 }