diff mbox series

[RFC,1/1] swiotlb: Reduce calls to swiotlb_find_pool()

Message ID 20240607031421.182589-1-mhklinux@outlook.com (mailing list archive)
State New, archived
Headers show
Series [RFC,1/1] swiotlb: Reduce calls to swiotlb_find_pool() | expand

Commit Message

Michael Kelley June 7, 2024, 3:14 a.m. UTC
From: Michael Kelley <mhklinux@outlook.com>

With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
places, the pool is found and used in one function, and then must
found again in the next function that is called because only the
tlb_addr is passed as an argument. These are the six call sites:

dma_direct_map_page:
1. swiotlb_map->swiotlb_tbl_map_single->swiotlb_bounce

dma_direct_unmap_page:
2. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
3. dma_direct_sync_single_for_cpu->swiotlb_sync_single_for_cpu->
	swiotlb_bounce
4. is_swiotlb_buffer
5. swiotlb_tbl_unmap_single->swiotlb_del_transient
6. swiotlb_tbl_unmap_single->swiotlb_release_slots

Reduce the number of calls by finding the pool at a higher level, and
passing it as an argument instead of searching again. A key change is
for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
and then pass this pool pointer to subsequent swiotlb functions.
With these changes, a round-trip map/unmap pair requires only 2 calls
to swiotlb_find_pool():

dma_direct_unmap_page:
1. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
2. is_swiotlb_buffer

These changes come from noticing the inefficiencies in a code review,
not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
so avoiding the redundant calls helps performance in a hot path.
When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
is minimal and the perf benefits are likely negligible, but no
harm is done.

No functional change is intended.

Signed-off-by: Michael Kelley <mhklinux@outlook.com>
---
This patch trades off making many of the core swiotlb APIs take
an additional argument in order to avoid duplicating calls to
swiotlb_find_pool(). The current code seems rather wasteful in
making 6 calls per round-trip, but I'm happy to accept others'
judgment as to whether getting rid of the waste is worth the
additional code complexity.

 drivers/iommu/dma-iommu.c | 27 ++++++++++++++------
 drivers/xen/swiotlb-xen.c | 25 +++++++++++-------
 include/linux/swiotlb.h   | 54 +++++++++++++++++++++------------------
 kernel/dma/direct.c       | 12 ++++++---
 kernel/dma/direct.h       | 18 ++++++++-----
 kernel/dma/swiotlb.c      | 43 ++++++++++++++++---------------
 6 files changed, 106 insertions(+), 73 deletions(-)

Comments

Michael Kelley June 26, 2024, 11:58 p.m. UTC | #1
From: mhkelley58@gmail.com <mhkelley58@gmail.com> Sent: Thursday, June 6, 2024 8:14 PM
> 
> With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
> in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
> places, the pool is found and used in one function, and then must
> found again in the next function that is called because only the
> tlb_addr is passed as an argument. These are the six call sites:
> 
> dma_direct_map_page:
> 1. swiotlb_map->swiotlb_tbl_map_single->swiotlb_bounce
> 
> dma_direct_unmap_page:
> 2. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
> 3. dma_direct_sync_single_for_cpu->swiotlb_sync_single_for_cpu->
> 	swiotlb_bounce
> 4. is_swiotlb_buffer
> 5. swiotlb_tbl_unmap_single->swiotlb_del_transient
> 6. swiotlb_tbl_unmap_single->swiotlb_release_slots
> 
> Reduce the number of calls by finding the pool at a higher level, and
> passing it as an argument instead of searching again. A key change is
> for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
> and then pass this pool pointer to subsequent swiotlb functions.
> With these changes, a round-trip map/unmap pair requires only 2 calls
> to swiotlb_find_pool():
> 
> dma_direct_unmap_page:
> 1. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
> 2. is_swiotlb_buffer
> 
> These changes come from noticing the inefficiencies in a code review,
> not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
> swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
> so avoiding the redundant calls helps performance in a hot path.
> When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
> is minimal and the perf benefits are likely negligible, but no
> harm is done.
> 
> No functional change is intended.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> This patch trades off making many of the core swiotlb APIs take
> an additional argument in order to avoid duplicating calls to
> swiotlb_find_pool(). The current code seems rather wasteful in
> making 6 calls per round-trip, but I'm happy to accept others'
> judgment as to whether getting rid of the waste is worth the
> additional code complexity.

Quick ping on this RFC.  Is there any interest in moving forward?
Quite a few lines of code are affected because of adding the
additional "pool" argument to several functions, but the change
is conceptually pretty simple.

Michael

> 
>  drivers/iommu/dma-iommu.c | 27 ++++++++++++++------
>  drivers/xen/swiotlb-xen.c | 25 +++++++++++-------
>  include/linux/swiotlb.h   | 54 +++++++++++++++++++++------------------
>  kernel/dma/direct.c       | 12 ++++++---
>  kernel/dma/direct.h       | 18 ++++++++-----
>  kernel/dma/swiotlb.c      | 43 ++++++++++++++++---------------
>  6 files changed, 106 insertions(+), 73 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f731e4b2a417..ab6bc37ecf90 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1073,6 +1073,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>  		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t phys;
> +	struct io_tlb_pool *pool;
> 
>  	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>  		return;
> @@ -1081,21 +1082,25 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_cpu(phys, size, dir);
> 
> -	if (is_swiotlb_buffer(dev, phys))
> -		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
> +	pool = is_swiotlb_buffer(dev, phys);
> +	if (pool)
> +		swiotlb_sync_single_for_cpu(dev, phys, size, dir, pool);
>  }
> 
>  static void iommu_dma_sync_single_for_device(struct device *dev,
>  		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t phys;
> +	struct io_tlb_pool *pool;
> 
>  	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>  		return;
> 
>  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> -	if (is_swiotlb_buffer(dev, phys))
> -		swiotlb_sync_single_for_device(dev, phys, size, dir);
> +
> +	pool = is_swiotlb_buffer(dev, phys);
> +	if (pool)
> +		swiotlb_sync_single_for_device(dev, phys, size, dir, pool);
> 
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_device(phys, size, dir);
> @@ -1189,8 +1194,12 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>  		arch_sync_dma_for_device(phys, size, dir);
> 
>  	iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
> -		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +	if (iova == DMA_MAPPING_ERROR) {
> +		struct io_tlb_pool *pool = is_swiotlb_buffer(dev, phys);
> +
> +		if (pool)
> +			swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, pool);
> +	}
>  	return iova;
>  }
> 
> @@ -1199,6 +1208,7 @@ static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
>  {
>  	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>  	phys_addr_t phys;
> +	struct io_tlb_pool *pool;
> 
>  	phys = iommu_iova_to_phys(domain, dma_handle);
>  	if (WARN_ON(!phys))
> @@ -1209,8 +1219,9 @@ static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
> 
>  	__iommu_dma_unmap(dev, dma_handle, size);
> 
> -	if (unlikely(is_swiotlb_buffer(dev, phys)))
> -		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +	pool = is_swiotlb_buffer(dev, phys);
> +	if (unlikely(pool))
> +		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, pool);
>  }
> 
>  /*
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 6579ae3f6dac..7af8c8466e1d 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -88,7 +88,7 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>  	return 0;
>  }
> 
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static struct io_tlb_pool *is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  {
>  	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>  	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  	 */
>  	if (pfn_valid(PFN_DOWN(paddr)))
>  		return is_swiotlb_buffer(dev, paddr);
> -	return 0;
> +	return NULL;
>  }
> 
>  #ifdef CONFIG_X86
> @@ -228,7 +228,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
>  	 */
>  	if (unlikely(!dma_capable(dev, dev_addr, size, true))) {
>  		swiotlb_tbl_unmap_single(dev, map, size, dir,
> -				attrs | DMA_ATTR_SKIP_CPU_SYNC);
> +				attrs | DMA_ATTR_SKIP_CPU_SYNC,
> +				swiotlb_find_pool(dev, map));
>  		return DMA_MAPPING_ERROR;
>  	}
> 
> @@ -254,6 +255,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>  		size_t size, enum dma_data_direction dir, unsigned long attrs)
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(hwdev, dev_addr);
> +	struct io_tlb_pool *pool;
> 
>  	BUG_ON(dir == DMA_NONE);
> 
> @@ -265,8 +267,9 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>  	}
> 
>  	/* NOTE: We use dev_addr here, not paddr! */
> -	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
> -		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
> +	pool = is_xen_swiotlb_buffer(hwdev, dev_addr);
> +	if (pool)
> +		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs, pool);
>  }
> 
>  static void
> @@ -274,6 +277,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>  		size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
> +	struct io_tlb_pool *pool;
> 
>  	if (!dev_is_dma_coherent(dev)) {
>  		if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr))))
> @@ -282,8 +286,9 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>  			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
>  	}
> 
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> -		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
> +	pool = is_xen_swiotlb_buffer(dev, dma_addr);
> +	if (pool)
> +		swiotlb_sync_single_for_cpu(dev, paddr, size, dir, pool);
>  }
> 
>  static void
> @@ -291,9 +296,11 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
>  		size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
> +	struct io_tlb_pool *pool;
> 
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> -		swiotlb_sync_single_for_device(dev, paddr, size, dir);
> +	pool = is_xen_swiotlb_buffer(dev, dma_addr);
> +	if (pool)
> +		swiotlb_sync_single_for_device(dev, paddr, size, dir, pool);
> 
>  	if (!dev_is_dma_coherent(dev)) {
>  		if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr))))
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 14bc10c1bb23..ce8651949123 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -42,24 +42,6 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  	int (*remap)(void *tlb, unsigned long nslabs));
>  extern void __init swiotlb_update_mem_attributes(void);
> 
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> -		size_t mapping_size,
> -		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
> -		unsigned long attrs);
> -
> -extern void swiotlb_tbl_unmap_single(struct device *hwdev,
> -				     phys_addr_t tlb_addr,
> -				     size_t mapping_size,
> -				     enum dma_data_direction dir,
> -				     unsigned long attrs);
> -
> -void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
> -		size_t size, enum dma_data_direction dir);
> -void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
> -		size_t size, enum dma_data_direction dir);
> -dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
> -		size_t size, enum dma_data_direction dir, unsigned long attrs);
> -
>  #ifdef CONFIG_SWIOTLB
> 
>  /**
> @@ -168,12 +150,12 @@ static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
>   * * %true if @paddr points into a bounce buffer
>   * * %false otherwise
>   */
> -static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
> +static inline struct io_tlb_pool *is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  {
>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
> 
>  	if (!mem)
> -		return false;
> +		return NULL;
> 
>  #ifdef CONFIG_SWIOTLB_DYNAMIC
>  	/*
> @@ -187,10 +169,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  	 * This barrier pairs with smp_mb() in swiotlb_find_slots().
>  	 */
>  	smp_rmb();
> -	return READ_ONCE(dev->dma_uses_io_tlb) &&
> -		swiotlb_find_pool(dev, paddr);
> +	if (READ_ONCE(dev->dma_uses_io_tlb))
> +		return swiotlb_find_pool(dev, paddr);
> +	return NULL;
>  #else
> -	return paddr >= mem->defpool.start && paddr < mem->defpool.end;
> +	if (paddr >= mem->defpool.start && paddr < mem->defpool.end)
> +		return &mem->defpool;
> +	return NULL;
>  #endif
>  }
> 
> @@ -201,6 +186,25 @@ static inline bool is_swiotlb_force_bounce(struct device *dev)
>  	return mem && mem->force_bounce;
>  }
> 
> +phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> +		size_t mapping_size,
> +		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
> +		unsigned long attrs);
> +
> +extern void swiotlb_tbl_unmap_single(struct device *hwdev,
> +				     phys_addr_t tlb_addr,
> +				     size_t mapping_size,
> +				     enum dma_data_direction dir,
> +				     unsigned long attrs,
> +				     struct io_tlb_pool *pool);
> +
> +void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
> +		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool);
> +void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
> +		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool);
> +dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
> +		size_t size, enum dma_data_direction dir, unsigned long attrs);
> +
>  void swiotlb_init(bool addressing_limited, unsigned int flags);
>  void __init swiotlb_exit(void);
>  void swiotlb_dev_init(struct device *dev);
> @@ -219,9 +223,9 @@ static inline void swiotlb_dev_init(struct device *dev)
>  {
>  }
> 
> -static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
> +static inline struct io_tlb_pool *is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  {
> -	return false;
> +	return NULL;
>  }
>  static inline bool is_swiotlb_force_bounce(struct device *dev)
>  {
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4d543b1e9d57..50689afb0ffd 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -399,14 +399,16 @@ void dma_direct_sync_sg_for_device(struct device *dev,
>  		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
>  {
>  	struct scatterlist *sg;
> +	struct io_tlb_pool *pool;
>  	int i;
> 
>  	for_each_sg(sgl, sg, nents, i) {
>  		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
> 
> -		if (unlikely(is_swiotlb_buffer(dev, paddr)))
> +		pool = is_swiotlb_buffer(dev, paddr);
> +		if (unlikely(pool))
>  			swiotlb_sync_single_for_device(dev, paddr, sg->length,
> -						       dir);
> +						       dir, pool);
> 
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_device(paddr, sg->length,
> @@ -422,6 +424,7 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>  		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
>  {
>  	struct scatterlist *sg;
> +	struct io_tlb_pool *pool;
>  	int i;
> 
>  	for_each_sg(sgl, sg, nents, i) {
> @@ -430,9 +433,10 @@ void dma_direct_sync_sg_for_cpu(struct device *dev,
>  		if (!dev_is_dma_coherent(dev))
>  			arch_sync_dma_for_cpu(paddr, sg->length, dir);
> 
> -		if (unlikely(is_swiotlb_buffer(dev, paddr)))
> +		pool = is_swiotlb_buffer(dev, paddr);
> +		if (unlikely(pool))
>  			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
> -						    dir);
> +						    dir, pool);
> 
>  		if (dir == DMA_FROM_DEVICE)
>  			arch_dma_mark_clean(paddr, sg->length);
> diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
> index 18d346118fe8..72aa65558e07 100644
> --- a/kernel/dma/direct.h
> +++ b/kernel/dma/direct.h
> @@ -57,9 +57,11 @@ static inline void dma_direct_sync_single_for_device(struct device *dev,
>  		dma_addr_t addr, size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t paddr = dma_to_phys(dev, addr);
> +	struct io_tlb_pool *pool;
> 
> -	if (unlikely(is_swiotlb_buffer(dev, paddr)))
> -		swiotlb_sync_single_for_device(dev, paddr, size, dir);
> +	pool = is_swiotlb_buffer(dev, paddr);
> +	if (unlikely(pool))
> +		swiotlb_sync_single_for_device(dev, paddr, size, dir, pool);
> 
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_device(paddr, size, dir);
> @@ -69,14 +71,16 @@ static inline void dma_direct_sync_single_for_cpu(struct device *dev,
>  		dma_addr_t addr, size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t paddr = dma_to_phys(dev, addr);
> +	struct io_tlb_pool *pool;
> 
>  	if (!dev_is_dma_coherent(dev)) {
>  		arch_sync_dma_for_cpu(paddr, size, dir);
>  		arch_sync_dma_for_cpu_all();
>  	}
> 
> -	if (unlikely(is_swiotlb_buffer(dev, paddr)))
> -		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
> +	pool = is_swiotlb_buffer(dev, paddr);
> +	if (unlikely(pool))
> +		swiotlb_sync_single_for_cpu(dev, paddr, size, dir, pool);
> 
>  	if (dir == DMA_FROM_DEVICE)
>  		arch_dma_mark_clean(paddr, size);
> @@ -117,12 +121,14 @@ static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
>  		size_t size, enum dma_data_direction dir, unsigned long attrs)
>  {
>  	phys_addr_t phys = dma_to_phys(dev, addr);
> +	struct io_tlb_pool *pool;
> 
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
>  		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
> 
> -	if (unlikely(is_swiotlb_buffer(dev, phys)))
> +	pool = is_swiotlb_buffer(dev, phys);
> +	if (unlikely(pool))
>  		swiotlb_tbl_unmap_single(dev, phys, size, dir,
> -					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
> +					 attrs | DMA_ATTR_SKIP_CPU_SYNC, pool);
>  }
>  #endif /* _KERNEL_DMA_DIRECT_H */
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index fe1ccb53596f..59b3e333651d 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -855,9 +855,8 @@ static unsigned int swiotlb_align_offset(struct device *dev,
>   * Bounce: copy the swiotlb buffer from or back to the original dma location
>   */
>  static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size,
> -			   enum dma_data_direction dir)
> +			   enum dma_data_direction dir, struct io_tlb_pool *mem)
>  {
> -	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
>  	int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
>  	phys_addr_t orig_addr = mem->slots[index].orig_addr;
>  	size_t alloc_size = mem->slots[index].alloc_size;
> @@ -1435,13 +1434,13 @@ phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
>  	 * hardware behavior.  Use of swiotlb is supposed to be transparent,
>  	 * i.e. swiotlb must not corrupt memory by clobbering unwritten bytes.
>  	 */
> -	swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
> +	swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE, pool);
>  	return tlb_addr;
>  }
> 
> -static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
> +static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr,
> +				  struct io_tlb_pool *mem)
>  {
> -	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
>  	unsigned long flags;
>  	unsigned int offset = swiotlb_align_offset(dev, 0, tlb_addr);
>  	int index, nslots, aindex;
> @@ -1505,11 +1504,9 @@ static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
>   *
>   * Return: %true if @tlb_addr belonged to a transient pool that was released.
>   */
> -static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr)
> +static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr,
> +				  struct io_tlb_pool *pool)
>  {
> -	struct io_tlb_pool *pool;
> -
> -	pool = swiotlb_find_pool(dev, tlb_addr);
>  	if (!pool->transient)
>  		return false;
> 
> @@ -1522,7 +1519,8 @@ static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr)
>  #else  /* !CONFIG_SWIOTLB_DYNAMIC */
> 
>  static inline bool swiotlb_del_transient(struct device *dev,
> -					 phys_addr_t tlb_addr)
> +					 phys_addr_t tlb_addr,
> +					 struct io_tlb_pool *pool)
>  {
>  	return false;
>  }
> @@ -1534,34 +1532,34 @@ static inline bool swiotlb_del_transient(struct device *dev,
>   */
>  void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
>  			      size_t mapping_size, enum dma_data_direction dir,
> -			      unsigned long attrs)
> +			      unsigned long attrs, struct io_tlb_pool *pool)
>  {
>  	/*
>  	 * First, sync the memory before unmapping the entry
>  	 */
>  	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
>  	    (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
> -		swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
> +		swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE, pool);
> 
> -	if (swiotlb_del_transient(dev, tlb_addr))
> +	if (swiotlb_del_transient(dev, tlb_addr, pool))
>  		return;
> -	swiotlb_release_slots(dev, tlb_addr);
> +	swiotlb_release_slots(dev, tlb_addr, pool);
>  }
> 
>  void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
> -		size_t size, enum dma_data_direction dir)
> +		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool)
>  {
>  	if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
> -		swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
> +		swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE, pool);
>  	else
>  		BUG_ON(dir != DMA_FROM_DEVICE);
>  }
> 
>  void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
> -		size_t size, enum dma_data_direction dir)
> +		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool)
>  {
>  	if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
> -		swiotlb_bounce(dev, tlb_addr, size, DMA_FROM_DEVICE);
> +		swiotlb_bounce(dev, tlb_addr, size, DMA_FROM_DEVICE, pool);
>  	else
>  		BUG_ON(dir != DMA_TO_DEVICE);
>  }
> @@ -1586,7 +1584,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
>  	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
>  	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
>  		swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
> -			attrs | DMA_ATTR_SKIP_CPU_SYNC);
> +			attrs | DMA_ATTR_SKIP_CPU_SYNC,
> +			swiotlb_find_pool(dev, swiotlb_addr));
>  		dev_WARN_ONCE(dev, 1,
>  			"swiotlb addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
>  			&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
> @@ -1774,11 +1773,13 @@ struct page *swiotlb_alloc(struct device *dev, size_t size)
>  bool swiotlb_free(struct device *dev, struct page *page, size_t size)
>  {
>  	phys_addr_t tlb_addr = page_to_phys(page);
> +	struct io_tlb_pool *pool;
> 
> -	if (!is_swiotlb_buffer(dev, tlb_addr))
> +	pool = is_swiotlb_buffer(dev, tlb_addr);
> +	if (!pool)
>  		return false;
> 
> -	swiotlb_release_slots(dev, tlb_addr);
> +	swiotlb_release_slots(dev, tlb_addr, pool);
> 
>  	return true;
>  }
> --
> 2.25.1
>
Christoph Hellwig June 27, 2024, 6:02 a.m. UTC | #2
On Wed, Jun 26, 2024 at 11:58:13PM +0000, Michael Kelley wrote:
> > This patch trades off making many of the core swiotlb APIs take
> > an additional argument in order to avoid duplicating calls to
> > swiotlb_find_pool(). The current code seems rather wasteful in
> > making 6 calls per round-trip, but I'm happy to accept others'
> > judgment as to whether getting rid of the waste is worth the
> > additional code complexity.
> 
> Quick ping on this RFC.  Is there any interest in moving forward?
> Quite a few lines of code are affected because of adding the
> additional "pool" argument to several functions, but the change
> is conceptually pretty simple.

Yes, this looks sensible to me.  I'm tempted to apply it.
Petr Tesařík June 27, 2024, 6:52 a.m. UTC | #3
On Thu, 27 Jun 2024 08:02:51 +0200
"hch@lst.de" <hch@lst.de> wrote:

> On Wed, Jun 26, 2024 at 11:58:13PM +0000, Michael Kelley wrote:
> > > This patch trades off making many of the core swiotlb APIs take
> > > an additional argument in order to avoid duplicating calls to
> > > swiotlb_find_pool(). The current code seems rather wasteful in
> > > making 6 calls per round-trip, but I'm happy to accept others'
> > > judgment as to whether getting rid of the waste is worth the
> > > additional code complexity.  
> > 
> > Quick ping on this RFC.  Is there any interest in moving forward?
> > Quite a few lines of code are affected because of adding the
> > additional "pool" argument to several functions, but the change
> > is conceptually pretty simple.  
> 
> Yes, this looks sensible to me.  I'm tempted to apply it.

Oh, right. The idea is good, but I was not able to reply immediately
and then forgot about it.

For the record, I considered an alternative: Call swiotlb_* functions
unconditionally and bail out early if the pool is NULL. But it's no
good, because is_swiotlb_buffer() can be inlined, so this approach
would replace a quick check with a function call. And then there's also
swiotlb_tbl_unmap_single()...

I have only a very minor suggestion: Could is_swiotlb_buffer() be
renamed now that it no longer returns a bool? OTOH I have no good
immediate idea myself.

Petr T
Petr Tesařík June 27, 2024, 7:20 a.m. UTC | #4
On Thu,  6 Jun 2024 20:14:21 -0700
mhkelley58@gmail.com wrote:

> From: Michael Kelley <mhklinux@outlook.com>
> 
> With CONFIG_SWIOTLB_DYNAMIC enabled, each round-trip map/unmap pair
> in the swiotlb results in 6 calls to swiotlb_find_pool(). In multiple
> places, the pool is found and used in one function, and then must
> found again in the next function that is called because only the
> tlb_addr is passed as an argument. These are the six call sites:
> 
> dma_direct_map_page:
> 1. swiotlb_map->swiotlb_tbl_map_single->swiotlb_bounce
> 
> dma_direct_unmap_page:
> 2. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
> 3. dma_direct_sync_single_for_cpu->swiotlb_sync_single_for_cpu->
> 	swiotlb_bounce
> 4. is_swiotlb_buffer
> 5. swiotlb_tbl_unmap_single->swiotlb_del_transient
> 6. swiotlb_tbl_unmap_single->swiotlb_release_slots
> 
> Reduce the number of calls by finding the pool at a higher level, and
> passing it as an argument instead of searching again. A key change is
> for is_swiotlb_buffer() to return a pool pointer instead of a boolean,
> and then pass this pool pointer to subsequent swiotlb functions.
> With these changes, a round-trip map/unmap pair requires only 2 calls
> to swiotlb_find_pool():
> 
> dma_direct_unmap_page:
> 1. dma_direct_sync_single_for_cpu->is_swiotlb_buffer
> 2. is_swiotlb_buffer
> 
> These changes come from noticing the inefficiencies in a code review,
> not from performance measurements. With CONFIG_SWIOTLB_DYNAMIC,
> swiotlb_find_pool() is not trivial, and it uses an RCU read lock,
> so avoiding the redundant calls helps performance in a hot path.
> When CONFIG_SWIOTLB_DYNAMIC is *not* set, the code size reduction
> is minimal and the perf benefits are likely negligible, but no
> harm is done.
> 
> No functional change is intended.
> 
> Signed-off-by: Michael Kelley <mhklinux@outlook.com>
> ---
> This patch trades off making many of the core swiotlb APIs take
> an additional argument in order to avoid duplicating calls to
> swiotlb_find_pool(). The current code seems rather wasteful in
> making 6 calls per round-trip, but I'm happy to accept others'
> judgment as to whether getting rid of the waste is worth the
> additional code complexity.
> 
>  drivers/iommu/dma-iommu.c | 27 ++++++++++++++------
>  drivers/xen/swiotlb-xen.c | 25 +++++++++++-------
>  include/linux/swiotlb.h   | 54 +++++++++++++++++++++------------------
>  kernel/dma/direct.c       | 12 ++++++---
>  kernel/dma/direct.h       | 18 ++++++++-----
>  kernel/dma/swiotlb.c      | 43 ++++++++++++++++---------------
>  6 files changed, 106 insertions(+), 73 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index f731e4b2a417..ab6bc37ecf90 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1073,6 +1073,7 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>  		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t phys;
> +	struct io_tlb_pool *pool;
>  
>  	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>  		return;
> @@ -1081,21 +1082,25 @@ static void iommu_dma_sync_single_for_cpu(struct device *dev,
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_cpu(phys, size, dir);
>  
> -	if (is_swiotlb_buffer(dev, phys))
> -		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
> +	pool = is_swiotlb_buffer(dev, phys);
> +	if (pool)
> +		swiotlb_sync_single_for_cpu(dev, phys, size, dir, pool);
>  }
>  
>  static void iommu_dma_sync_single_for_device(struct device *dev,
>  		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t phys;
> +	struct io_tlb_pool *pool;
>  
>  	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
>  		return;
>  
>  	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
> -	if (is_swiotlb_buffer(dev, phys))
> -		swiotlb_sync_single_for_device(dev, phys, size, dir);
> +
> +	pool = is_swiotlb_buffer(dev, phys);
> +	if (pool)
> +		swiotlb_sync_single_for_device(dev, phys, size, dir, pool);
>  
>  	if (!dev_is_dma_coherent(dev))
>  		arch_sync_dma_for_device(phys, size, dir);
> @@ -1189,8 +1194,12 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
>  		arch_sync_dma_for_device(phys, size, dir);
>  
>  	iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
> -	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
> -		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +	if (iova == DMA_MAPPING_ERROR) {
> +		struct io_tlb_pool *pool = is_swiotlb_buffer(dev, phys);
> +
> +		if (pool)
> +			swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, pool);
> +	}
>  	return iova;
>  }
>  
> @@ -1199,6 +1208,7 @@ static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
>  {
>  	struct iommu_domain *domain = iommu_get_dma_domain(dev);
>  	phys_addr_t phys;
> +	struct io_tlb_pool *pool;
>  
>  	phys = iommu_iova_to_phys(domain, dma_handle);
>  	if (WARN_ON(!phys))
> @@ -1209,8 +1219,9 @@ static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
>  
>  	__iommu_dma_unmap(dev, dma_handle, size);
>  
> -	if (unlikely(is_swiotlb_buffer(dev, phys)))
> -		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
> +	pool = is_swiotlb_buffer(dev, phys);
> +	if (unlikely(pool))
> +		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, pool);
>  }
>  
>  /*
> diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
> index 6579ae3f6dac..7af8c8466e1d 100644
> --- a/drivers/xen/swiotlb-xen.c
> +++ b/drivers/xen/swiotlb-xen.c
> @@ -88,7 +88,7 @@ static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
>  	return 0;
>  }
>  
> -static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
> +static struct io_tlb_pool *is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  {
>  	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
>  	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
> @@ -100,7 +100,7 @@ static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
>  	 */
>  	if (pfn_valid(PFN_DOWN(paddr)))
>  		return is_swiotlb_buffer(dev, paddr);
> -	return 0;
> +	return NULL;
>  }
>  
>  #ifdef CONFIG_X86
> @@ -228,7 +228,8 @@ static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
>  	 */
>  	if (unlikely(!dma_capable(dev, dev_addr, size, true))) {
>  		swiotlb_tbl_unmap_single(dev, map, size, dir,
> -				attrs | DMA_ATTR_SKIP_CPU_SYNC);
> +				attrs | DMA_ATTR_SKIP_CPU_SYNC,
> +				swiotlb_find_pool(dev, map));
>  		return DMA_MAPPING_ERROR;
>  	}
>  
> @@ -254,6 +255,7 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>  		size_t size, enum dma_data_direction dir, unsigned long attrs)
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(hwdev, dev_addr);
> +	struct io_tlb_pool *pool;
>  
>  	BUG_ON(dir == DMA_NONE);
>  
> @@ -265,8 +267,9 @@ static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
>  	}
>  
>  	/* NOTE: We use dev_addr here, not paddr! */
> -	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
> -		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
> +	pool = is_xen_swiotlb_buffer(hwdev, dev_addr);
> +	if (pool)
> +		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs, pool);
>  }
>  
>  static void
> @@ -274,6 +277,7 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>  		size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
> +	struct io_tlb_pool *pool;
>  
>  	if (!dev_is_dma_coherent(dev)) {
>  		if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr))))
> @@ -282,8 +286,9 @@ xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
>  			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
>  	}
>  
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> -		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
> +	pool = is_xen_swiotlb_buffer(dev, dma_addr);
> +	if (pool)
> +		swiotlb_sync_single_for_cpu(dev, paddr, size, dir, pool);
>  }
>  
>  static void
> @@ -291,9 +296,11 @@ xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
>  		size_t size, enum dma_data_direction dir)
>  {
>  	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
> +	struct io_tlb_pool *pool;
>  
> -	if (is_xen_swiotlb_buffer(dev, dma_addr))
> -		swiotlb_sync_single_for_device(dev, paddr, size, dir);
> +	pool = is_xen_swiotlb_buffer(dev, dma_addr);
> +	if (pool)
> +		swiotlb_sync_single_for_device(dev, paddr, size, dir, pool);
>  
>  	if (!dev_is_dma_coherent(dev)) {
>  		if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr))))
> diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
> index 14bc10c1bb23..ce8651949123 100644
> --- a/include/linux/swiotlb.h
> +++ b/include/linux/swiotlb.h
> @@ -42,24 +42,6 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
>  	int (*remap)(void *tlb, unsigned long nslabs));
>  extern void __init swiotlb_update_mem_attributes(void);
>  
> -phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
> -		size_t mapping_size,
> -		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
> -		unsigned long attrs);
> -
> -extern void swiotlb_tbl_unmap_single(struct device *hwdev,
> -				     phys_addr_t tlb_addr,
> -				     size_t mapping_size,
> -				     enum dma_data_direction dir,
> -				     unsigned long attrs);
> -
> -void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
> -		size_t size, enum dma_data_direction dir);
> -void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
> -		size_t size, enum dma_data_direction dir);
> -dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
> -		size_t size, enum dma_data_direction dir, unsigned long attrs);
> -
>  #ifdef CONFIG_SWIOTLB
>  
>  /**
> @@ -168,12 +150,12 @@ static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
>   * * %true if @paddr points into a bounce buffer
>   * * %false otherwise
>   */
> -static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
> +static inline struct io_tlb_pool *is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  {
>  	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
>  
>  	if (!mem)
> -		return false;
> +		return NULL;
>  
>  #ifdef CONFIG_SWIOTLB_DYNAMIC
>  	/*
> @@ -187,10 +169,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
>  	 * This barrier pairs with smp_mb() in swiotlb_find_slots().
>  	 */
>  	smp_rmb();
> -	return READ_ONCE(dev->dma_uses_io_tlb) &&
> -		swiotlb_find_pool(dev, paddr);
> +	if (READ_ONCE(dev->dma_uses_io_tlb))
> +		return swiotlb_find_pool(dev, paddr);
> +	return NULL;
>  #else
> -	return paddr >= mem->defpool.start && paddr < mem->defpool.end;
> +	if (paddr >= mem->defpool.start && paddr < mem->defpool.end)
> +		return &mem->defpool;

Why are we open-coding swiotlb_find_pool() here? It does not make a
difference now, but if swiotlb_find_pool() were to change, both places
would have to be updated.

Does it save a reload from dev->dma_io_tlb_mem? IOW is the compiler
unable to optimize it away?

What about this (functionally identical) variant:

#ifdef CONFIG_SWIOTLB_DYNAMIC
	smp_rmb();
	if (!READ_ONCE(dev->dma_uses_io_tlb))
		return NULL;
#else
	if (paddr < mem->defpool.start || paddr >= mem->defpool.end);
		return NULL;
#endif

	return swiotlb_find_pool(dev, paddr);

Petr T
Michael Kelley June 27, 2024, 2:59 p.m. UTC | #5
From: Petr Tesařík <petr@tesarici.cz> Sent: Wednesday, June 26, 2024 11:52 PM
> 
> Oh, right. The idea is good, but I was not able to reply immediately
> and then forgot about it.
> 
> For the record, I considered an alternative: Call swiotlb_* functions
> unconditionally and bail out early if the pool is NULL. But it's no
> good, because is_swiotlb_buffer() can be inlined, so this approach
> would replace a quick check with a function call. And then there's also
> swiotlb_tbl_unmap_single()...
> 
> I have only a very minor suggestion: Could is_swiotlb_buffer() be
> renamed now that it no longer returns a bool? OTOH I have no good
> immediate idea myself.
>

Conceptually, it's still being used as a boolean function based on
whether the return value is NULL.  Renaming it to swiotlb_get_pool()
more accurately describes the return value, but obscures the
intent of determining if it is a swiotlb buffer.  I'll think about it.
Suggestions are welcome.

Michael
Michael Kelley June 27, 2024, 3:04 p.m. UTC | #6
From: Petr Tesařík <petr@tesarici.cz> Sent: Thursday, June 27, 2024 12:21 AM

[...]

> > @@ -187,10 +169,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
> >  	 * This barrier pairs with smp_mb() in swiotlb_find_slots().
> >  	 */
> >  	smp_rmb();
> > -	return READ_ONCE(dev->dma_uses_io_tlb) &&
> > -		swiotlb_find_pool(dev, paddr);
> > +	if (READ_ONCE(dev->dma_uses_io_tlb))
> > +		return swiotlb_find_pool(dev, paddr);
> > +	return NULL;
> >  #else
> > -	return paddr >= mem->defpool.start && paddr < mem->defpool.end;
> > +	if (paddr >= mem->defpool.start && paddr < mem->defpool.end)
> > +		return &mem->defpool;
> 
> Why are we open-coding swiotlb_find_pool() here? It does not make a
> difference now, but if swiotlb_find_pool() were to change, both places
> would have to be updated.
> 
> Does it save a reload from dev->dma_io_tlb_mem? IOW is the compiler
> unable to optimize it away?
> 
> What about this (functionally identical) variant:
> 
> #ifdef CONFIG_SWIOTLB_DYNAMIC
> 	smp_rmb();
> 	if (!READ_ONCE(dev->dma_uses_io_tlb))
> 		return NULL;
> #else
> 	if (paddr < mem->defpool.start || paddr >= mem->defpool.end);
> 		return NULL;
> #endif
> 
> 	return swiotlb_find_pool(dev, paddr);
> 

Yeah, I see your point. I'll try this and see what the generated code
looks like. It might take me a couple of days to get to it.

Michael
Christoph Hellwig June 27, 2024, 3:25 p.m. UTC | #7
On Thu, Jun 27, 2024 at 02:59:03PM +0000, Michael Kelley wrote:
> Conceptually, it's still being used as a boolean function based on
> whether the return value is NULL.  Renaming it to swiotlb_get_pool()
> more accurately describes the return value, but obscures the
> intent of determining if it is a swiotlb buffer.  I'll think about it.
> Suggestions are welcome.

Just keep is_swiotlb_buffer as a trivial inline helper that returns
bool.
Michael Kelley June 27, 2024, 4:02 p.m. UTC | #8
From: hch@lst.de <hch@lst.de> Sent: Thursday, June 27, 2024 8:25 AM
> 
> On Thu, Jun 27, 2024 at 02:59:03PM +0000, Michael Kelley wrote:
> > Conceptually, it's still being used as a boolean function based on
> > whether the return value is NULL.  Renaming it to swiotlb_get_pool()
> > more accurately describes the return value, but obscures the
> > intent of determining if it is a swiotlb buffer.  I'll think about it.
> > Suggestions are welcome.
> 
> Just keep is_swiotlb_buffer as a trivial inline helper that returns
> bool.

I don't understand what you are suggesting.  Could you elaborate a bit?
is_swiotlb_buffer() can't be trivial when CONFIG_SWIOTLB_DYNAMIC
is set.

Michael
Christoph Hellwig June 28, 2024, 6:01 a.m. UTC | #9
On Thu, Jun 27, 2024 at 04:02:59PM +0000, Michael Kelley wrote:
> > > Conceptually, it's still being used as a boolean function based on
> > > whether the return value is NULL.  Renaming it to swiotlb_get_pool()
> > > more accurately describes the return value, but obscures the
> > > intent of determining if it is a swiotlb buffer.  I'll think about it.
> > > Suggestions are welcome.
> > 
> > Just keep is_swiotlb_buffer as a trivial inline helper that returns
> > bool.
> 
> I don't understand what you are suggesting.  Could you elaborate a bit?
> is_swiotlb_buffer() can't be trivial when CONFIG_SWIOTLB_DYNAMIC
> is set.

Call the main function that finds and retuns the pool swiotlb_find_pool,
and then have a is_swiotlb_buffer wrapper that just returns bool.
Petr Tesařík June 28, 2024, 7:47 a.m. UTC | #10
V Fri, 28 Jun 2024 08:01:29 +0200
"hch@lst.de" <hch@lst.de> napsáno:

> On Thu, Jun 27, 2024 at 04:02:59PM +0000, Michael Kelley wrote:
> > > > Conceptually, it's still being used as a boolean function based on
> > > > whether the return value is NULL.  Renaming it to swiotlb_get_pool()
> > > > more accurately describes the return value, but obscures the
> > > > intent of determining if it is a swiotlb buffer.  I'll think about it.
> > > > Suggestions are welcome.  
> > > 
> > > Just keep is_swiotlb_buffer as a trivial inline helper that returns
> > > bool.  
> > 
> > I don't understand what you are suggesting.  Could you elaborate a bit?
> > is_swiotlb_buffer() can't be trivial when CONFIG_SWIOTLB_DYNAMIC
> > is set.  
> 
> Call the main function that finds and retuns the pool swiotlb_find_pool,
> and then have a is_swiotlb_buffer wrapper that just returns bool.
> 

I see. That's not my point. After applying Michael's patch, the return
value is always used, except here:

bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
{
	return !dev_is_dma_coherent(dev) ||
	       is_swiotlb_buffer(dev, dma_to_phys(dev, dma_addr));
}

I don't think this one occurrence in the entire source tree is worth a
separate inline function.

If nobody has a better idea, I'm not really offended by keeping the
original name, is_swiotlb_buffer(). It would just become the only
function which starts with "is_" and provides more information in the
return value than a simple yes/no, and I thought there must be an
unwritten convention about that.

Petr T
Michael Kelley June 29, 2024, 3:53 p.m. UTC | #11
From: Michael Kelley <mhklinux@outlook.com> Sent: Thursday, June 27, 2024 8:05 AM
> 
> From: Petr Tesařík <petr@tesarici.cz> Sent: Thursday, June 27, 2024 12:21 AM
> 
> [...]
> 
> > > @@ -187,10 +169,13 @@ static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
> > >  	 * This barrier pairs with smp_mb() in swiotlb_find_slots().
> > >  	 */
> > >  	smp_rmb();
> > > -	return READ_ONCE(dev->dma_uses_io_tlb) &&
> > > -		swiotlb_find_pool(dev, paddr);
> > > +	if (READ_ONCE(dev->dma_uses_io_tlb))
> > > +		return swiotlb_find_pool(dev, paddr);
> > > +	return NULL;
> > >  #else
> > > -	return paddr >= mem->defpool.start && paddr < mem->defpool.end;
> > > +	if (paddr >= mem->defpool.start && paddr < mem->defpool.end)
> > > +		return &mem->defpool;
> >
> > Why are we open-coding swiotlb_find_pool() here? It does not make a
> > difference now, but if swiotlb_find_pool() were to change, both places
> > would have to be updated.
> >
> > Does it save a reload from dev->dma_io_tlb_mem? IOW is the compiler
> > unable to optimize it away?
> >
> > What about this (functionally identical) variant:
> >
> > #ifdef CONFIG_SWIOTLB_DYNAMIC
> > 	smp_rmb();
> > 	if (!READ_ONCE(dev->dma_uses_io_tlb))
> > 		return NULL;
> > #else
> > 	if (paddr < mem->defpool.start || paddr >= mem->defpool.end);
> > 		return NULL;
> > #endif
> >
> > 	return swiotlb_find_pool(dev, paddr);
> >
> 
> Yeah, I see your point. I'll try this and see what the generated code
> looks like. It might take me a couple of days to get to it.
> 

With and without CONFIG_SWIOTLB_DYNAMIC, there's no meaningful
difference in the generated code for x86 or for arm64.  

I'll incorporate this change into v2.

Michael
Michael Kelley June 29, 2024, 3:55 p.m. UTC | #12
From: Petr Tesařík <petr@tesarici.cz> Sent: Friday, June 28, 2024 12:47 AM
> 
> V Fri, 28 Jun 2024 08:01:29 +0200
> "hch@lst.de" <hch@lst.de> napsáno:
> 
> > On Thu, Jun 27, 2024 at 04:02:59PM +0000, Michael Kelley wrote:
> > > > > Conceptually, it's still being used as a boolean function based on
> > > > > whether the return value is NULL.  Renaming it to swiotlb_get_pool()
> > > > > more accurately describes the return value, but obscures the
> > > > > intent of determining if it is a swiotlb buffer.  I'll think about it.
> > > > > Suggestions are welcome.
> > > >
> > > > Just keep is_swiotlb_buffer as a trivial inline helper that returns
> > > > bool.
> > >
> > > I don't understand what you are suggesting.  Could you elaborate a bit?
> > > is_swiotlb_buffer() can't be trivial when CONFIG_SWIOTLB_DYNAMIC
> > > is set.
> >
> > Call the main function that finds and retuns the pool swiotlb_find_pool,
> > and then have a is_swiotlb_buffer wrapper that just returns bool.
> >
> 
> I see. That's not my point. After applying Michael's patch, the return
> value is always used, except here:
> 
> bool dma_direct_need_sync(struct device *dev, dma_addr_t dma_addr)
> {
> 	return !dev_is_dma_coherent(dev) ||
> 	       is_swiotlb_buffer(dev, dma_to_phys(dev, dma_addr));
> }
> 
> I don't think this one occurrence in the entire source tree is worth a
> separate inline function.
> 
> If nobody has a better idea, I'm not really offended by keeping the
> original name, is_swiotlb_buffer(). It would just become the only
> function which starts with "is_" and provides more information in the
> return value than a simple yes/no, and I thought there must be an
> unwritten convention about that.
> 

Unless there is further discussion on this point, I'll just keep the original
"is_swiotlb_buffer()" in v2.

Michael
Christoph Hellwig June 30, 2024, 5:55 a.m. UTC | #13
On Sat, Jun 29, 2024 at 03:55:58PM +0000, Michael Kelley wrote:
> Unless there is further discussion on this point, I'll just keep the original
> "is_swiotlb_buffer()" in v2.

That is the wrong name for something that returns the pool as pointed
out before.
Michael Kelley June 30, 2024, 2:02 p.m. UTC | #14
From: hch@lst.de <hch@lst.de> Sent: Saturday, June 29, 2024 10:56 PM
> 
> On Sat, Jun 29, 2024 at 03:55:58PM +0000, Michael Kelley wrote:
> > Unless there is further discussion on this point, I'll just keep the original
> > "is_swiotlb_buffer()" in v2.
> 
> That is the wrong name for something that returns the pool as pointed
> out before.

OK. Since any new name could cause confusion with the existing
swiotlb_find_pool(), here's my proposal:

1) Rename is_swiotlb_buffer() to swiotlb_find_pool(), since it
now returns a pool.  A NULL return value indicates that the
paddr is not an swiotlb buffer.

2) Similarly, rename is_xen_swiotlb_buffer() to
xen_swiotlb_find_pool()

3) The existing swiotlb_find_pool() has the same function signature,
but it is used only where the paddr is known to be an swiotlb buffer
and hence always succeeds. Rename it to __swiotlb_find_pool() as
the "internal" version of swiotlb_find_pool().

4) Do you still want is_swiotlb_buffer() as a trivial wrapper around
the new swiotlb_find_pool(), for use solely in dma_direct_need_sync()
where only a Boolean is needed and not the pool?

Thanks,

Michael
Christoph Hellwig July 1, 2024, 4:36 a.m. UTC | #15
On Sun, Jun 30, 2024 at 02:02:52PM +0000, Michael Kelley wrote:
> 1) Rename is_swiotlb_buffer() to swiotlb_find_pool(), since it
> now returns a pool.  A NULL return value indicates that the
> paddr is not an swiotlb buffer.
> 
> 2) Similarly, rename is_xen_swiotlb_buffer() to
> xen_swiotlb_find_pool()
> 
> 3) The existing swiotlb_find_pool() has the same function signature,
> but it is used only where the paddr is known to be an swiotlb buffer
> and hence always succeeds. Rename it to __swiotlb_find_pool() as
> the "internal" version of swiotlb_find_pool().

Sounds good.

> 4) Do you still want is_swiotlb_buffer() as a trivial wrapper around
> the new swiotlb_find_pool(), for use solely in dma_direct_need_sync()
> where only a Boolean is needed and not the pool?

If there is really just a single caller left we can skip the wrapper,
otherwise it might be handy.
Petr Tesařík July 1, 2024, 5:47 a.m. UTC | #16
On Mon, 1 Jul 2024 06:36:15 +0200
"hch@lst.de" <hch@lst.de> wrote:

> On Sun, Jun 30, 2024 at 02:02:52PM +0000, Michael Kelley wrote:
> > 1) Rename is_swiotlb_buffer() to swiotlb_find_pool(), since it
> > now returns a pool.  A NULL return value indicates that the
> > paddr is not an swiotlb buffer.
> > 
> > 2) Similarly, rename is_xen_swiotlb_buffer() to
> > xen_swiotlb_find_pool()
> > 
> > 3) The existing swiotlb_find_pool() has the same function signature,
> > but it is used only where the paddr is known to be an swiotlb buffer
> > and hence always succeeds. Rename it to __swiotlb_find_pool() as
> > the "internal" version of swiotlb_find_pool().  
> 
> Sounds good.

Agreed. Most importantly, the "nice" name swiotlb_find_pool() is used
for external users. The difference between swiotlb_find_pool() and
__swiotlb_find_pool() is that the former can be used with any device,
and the latter (internal) only with devices that make some use of
swiotlb. The main reason to keep them separate is that the internal
function should not be inlined if CONFIG_SWIOTLB_DYNAMIC=y.

I hope somebody finds my explanation useful when they touch the code
again in a few years from now. ;-)

> > 4) Do you still want is_swiotlb_buffer() as a trivial wrapper around
> > the new swiotlb_find_pool(), for use solely in dma_direct_need_sync()
> > where only a Boolean is needed and not the pool?  
> 
> If there is really just a single caller left we can skip the wrapper,
> otherwise it might be handy.

AFAICS dma_direct_need_sync() is the only such place.

Petr T
diff mbox series

Patch

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index f731e4b2a417..ab6bc37ecf90 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1073,6 +1073,7 @@  static void iommu_dma_sync_single_for_cpu(struct device *dev,
 		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t phys;
+	struct io_tlb_pool *pool;
 
 	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
@@ -1081,21 +1082,25 @@  static void iommu_dma_sync_single_for_cpu(struct device *dev,
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_cpu(phys, size, dir);
 
-	if (is_swiotlb_buffer(dev, phys))
-		swiotlb_sync_single_for_cpu(dev, phys, size, dir);
+	pool = is_swiotlb_buffer(dev, phys);
+	if (pool)
+		swiotlb_sync_single_for_cpu(dev, phys, size, dir, pool);
 }
 
 static void iommu_dma_sync_single_for_device(struct device *dev,
 		dma_addr_t dma_handle, size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t phys;
+	struct io_tlb_pool *pool;
 
 	if (dev_is_dma_coherent(dev) && !dev_use_swiotlb(dev, size, dir))
 		return;
 
 	phys = iommu_iova_to_phys(iommu_get_dma_domain(dev), dma_handle);
-	if (is_swiotlb_buffer(dev, phys))
-		swiotlb_sync_single_for_device(dev, phys, size, dir);
+
+	pool = is_swiotlb_buffer(dev, phys);
+	if (pool)
+		swiotlb_sync_single_for_device(dev, phys, size, dir, pool);
 
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_device(phys, size, dir);
@@ -1189,8 +1194,12 @@  static dma_addr_t iommu_dma_map_page(struct device *dev, struct page *page,
 		arch_sync_dma_for_device(phys, size, dir);
 
 	iova = __iommu_dma_map(dev, phys, size, prot, dma_mask);
-	if (iova == DMA_MAPPING_ERROR && is_swiotlb_buffer(dev, phys))
-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+	if (iova == DMA_MAPPING_ERROR) {
+		struct io_tlb_pool *pool = is_swiotlb_buffer(dev, phys);
+
+		if (pool)
+			swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, pool);
+	}
 	return iova;
 }
 
@@ -1199,6 +1208,7 @@  static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
 {
 	struct iommu_domain *domain = iommu_get_dma_domain(dev);
 	phys_addr_t phys;
+	struct io_tlb_pool *pool;
 
 	phys = iommu_iova_to_phys(domain, dma_handle);
 	if (WARN_ON(!phys))
@@ -1209,8 +1219,9 @@  static void iommu_dma_unmap_page(struct device *dev, dma_addr_t dma_handle,
 
 	__iommu_dma_unmap(dev, dma_handle, size);
 
-	if (unlikely(is_swiotlb_buffer(dev, phys)))
-		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs);
+	pool = is_swiotlb_buffer(dev, phys);
+	if (unlikely(pool))
+		swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs, pool);
 }
 
 /*
diff --git a/drivers/xen/swiotlb-xen.c b/drivers/xen/swiotlb-xen.c
index 6579ae3f6dac..7af8c8466e1d 100644
--- a/drivers/xen/swiotlb-xen.c
+++ b/drivers/xen/swiotlb-xen.c
@@ -88,7 +88,7 @@  static inline int range_straddles_page_boundary(phys_addr_t p, size_t size)
 	return 0;
 }
 
-static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
+static struct io_tlb_pool *is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
 {
 	unsigned long bfn = XEN_PFN_DOWN(dma_to_phys(dev, dma_addr));
 	unsigned long xen_pfn = bfn_to_local_pfn(bfn);
@@ -100,7 +100,7 @@  static int is_xen_swiotlb_buffer(struct device *dev, dma_addr_t dma_addr)
 	 */
 	if (pfn_valid(PFN_DOWN(paddr)))
 		return is_swiotlb_buffer(dev, paddr);
-	return 0;
+	return NULL;
 }
 
 #ifdef CONFIG_X86
@@ -228,7 +228,8 @@  static dma_addr_t xen_swiotlb_map_page(struct device *dev, struct page *page,
 	 */
 	if (unlikely(!dma_capable(dev, dev_addr, size, true))) {
 		swiotlb_tbl_unmap_single(dev, map, size, dir,
-				attrs | DMA_ATTR_SKIP_CPU_SYNC);
+				attrs | DMA_ATTR_SKIP_CPU_SYNC,
+				swiotlb_find_pool(dev, map));
 		return DMA_MAPPING_ERROR;
 	}
 
@@ -254,6 +255,7 @@  static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	phys_addr_t paddr = xen_dma_to_phys(hwdev, dev_addr);
+	struct io_tlb_pool *pool;
 
 	BUG_ON(dir == DMA_NONE);
 
@@ -265,8 +267,9 @@  static void xen_swiotlb_unmap_page(struct device *hwdev, dma_addr_t dev_addr,
 	}
 
 	/* NOTE: We use dev_addr here, not paddr! */
-	if (is_xen_swiotlb_buffer(hwdev, dev_addr))
-		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs);
+	pool = is_xen_swiotlb_buffer(hwdev, dev_addr);
+	if (pool)
+		swiotlb_tbl_unmap_single(hwdev, paddr, size, dir, attrs, pool);
 }
 
 static void
@@ -274,6 +277,7 @@  xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 		size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
+	struct io_tlb_pool *pool;
 
 	if (!dev_is_dma_coherent(dev)) {
 		if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr))))
@@ -282,8 +286,9 @@  xen_swiotlb_sync_single_for_cpu(struct device *dev, dma_addr_t dma_addr,
 			xen_dma_sync_for_cpu(dev, dma_addr, size, dir);
 	}
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
-		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
+	pool = is_xen_swiotlb_buffer(dev, dma_addr);
+	if (pool)
+		swiotlb_sync_single_for_cpu(dev, paddr, size, dir, pool);
 }
 
 static void
@@ -291,9 +296,11 @@  xen_swiotlb_sync_single_for_device(struct device *dev, dma_addr_t dma_addr,
 		size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = xen_dma_to_phys(dev, dma_addr);
+	struct io_tlb_pool *pool;
 
-	if (is_xen_swiotlb_buffer(dev, dma_addr))
-		swiotlb_sync_single_for_device(dev, paddr, size, dir);
+	pool = is_xen_swiotlb_buffer(dev, dma_addr);
+	if (pool)
+		swiotlb_sync_single_for_device(dev, paddr, size, dir, pool);
 
 	if (!dev_is_dma_coherent(dev)) {
 		if (pfn_valid(PFN_DOWN(dma_to_phys(dev, dma_addr))))
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 14bc10c1bb23..ce8651949123 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -42,24 +42,6 @@  int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 	int (*remap)(void *tlb, unsigned long nslabs));
 extern void __init swiotlb_update_mem_attributes(void);
 
-phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
-		size_t mapping_size,
-		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
-		unsigned long attrs);
-
-extern void swiotlb_tbl_unmap_single(struct device *hwdev,
-				     phys_addr_t tlb_addr,
-				     size_t mapping_size,
-				     enum dma_data_direction dir,
-				     unsigned long attrs);
-
-void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
-		size_t size, enum dma_data_direction dir);
-void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
-		size_t size, enum dma_data_direction dir);
-dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
-		size_t size, enum dma_data_direction dir, unsigned long attrs);
-
 #ifdef CONFIG_SWIOTLB
 
 /**
@@ -168,12 +150,12 @@  static inline struct io_tlb_pool *swiotlb_find_pool(struct device *dev,
  * * %true if @paddr points into a bounce buffer
  * * %false otherwise
  */
-static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
+static inline struct io_tlb_pool *is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
 	struct io_tlb_mem *mem = dev->dma_io_tlb_mem;
 
 	if (!mem)
-		return false;
+		return NULL;
 
 #ifdef CONFIG_SWIOTLB_DYNAMIC
 	/*
@@ -187,10 +169,13 @@  static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 	 * This barrier pairs with smp_mb() in swiotlb_find_slots().
 	 */
 	smp_rmb();
-	return READ_ONCE(dev->dma_uses_io_tlb) &&
-		swiotlb_find_pool(dev, paddr);
+	if (READ_ONCE(dev->dma_uses_io_tlb))
+		return swiotlb_find_pool(dev, paddr);
+	return NULL;
 #else
-	return paddr >= mem->defpool.start && paddr < mem->defpool.end;
+	if (paddr >= mem->defpool.start && paddr < mem->defpool.end)
+		return &mem->defpool;
+	return NULL;
 #endif
 }
 
@@ -201,6 +186,25 @@  static inline bool is_swiotlb_force_bounce(struct device *dev)
 	return mem && mem->force_bounce;
 }
 
+phys_addr_t swiotlb_tbl_map_single(struct device *hwdev, phys_addr_t phys,
+		size_t mapping_size,
+		unsigned int alloc_aligned_mask, enum dma_data_direction dir,
+		unsigned long attrs);
+
+extern void swiotlb_tbl_unmap_single(struct device *hwdev,
+				     phys_addr_t tlb_addr,
+				     size_t mapping_size,
+				     enum dma_data_direction dir,
+				     unsigned long attrs,
+				     struct io_tlb_pool *pool);
+
+void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
+		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool);
+void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
+		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool);
+dma_addr_t swiotlb_map(struct device *dev, phys_addr_t phys,
+		size_t size, enum dma_data_direction dir, unsigned long attrs);
+
 void swiotlb_init(bool addressing_limited, unsigned int flags);
 void __init swiotlb_exit(void);
 void swiotlb_dev_init(struct device *dev);
@@ -219,9 +223,9 @@  static inline void swiotlb_dev_init(struct device *dev)
 {
 }
 
-static inline bool is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
+static inline struct io_tlb_pool *is_swiotlb_buffer(struct device *dev, phys_addr_t paddr)
 {
-	return false;
+	return NULL;
 }
 static inline bool is_swiotlb_force_bounce(struct device *dev)
 {
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 4d543b1e9d57..50689afb0ffd 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -399,14 +399,16 @@  void dma_direct_sync_sg_for_device(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *sg;
+	struct io_tlb_pool *pool;
 	int i;
 
 	for_each_sg(sgl, sg, nents, i) {
 		phys_addr_t paddr = dma_to_phys(dev, sg_dma_address(sg));
 
-		if (unlikely(is_swiotlb_buffer(dev, paddr)))
+		pool = is_swiotlb_buffer(dev, paddr);
+		if (unlikely(pool))
 			swiotlb_sync_single_for_device(dev, paddr, sg->length,
-						       dir);
+						       dir, pool);
 
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_device(paddr, sg->length,
@@ -422,6 +424,7 @@  void dma_direct_sync_sg_for_cpu(struct device *dev,
 		struct scatterlist *sgl, int nents, enum dma_data_direction dir)
 {
 	struct scatterlist *sg;
+	struct io_tlb_pool *pool;
 	int i;
 
 	for_each_sg(sgl, sg, nents, i) {
@@ -430,9 +433,10 @@  void dma_direct_sync_sg_for_cpu(struct device *dev,
 		if (!dev_is_dma_coherent(dev))
 			arch_sync_dma_for_cpu(paddr, sg->length, dir);
 
-		if (unlikely(is_swiotlb_buffer(dev, paddr)))
+		pool = is_swiotlb_buffer(dev, paddr);
+		if (unlikely(pool))
 			swiotlb_sync_single_for_cpu(dev, paddr, sg->length,
-						    dir);
+						    dir, pool);
 
 		if (dir == DMA_FROM_DEVICE)
 			arch_dma_mark_clean(paddr, sg->length);
diff --git a/kernel/dma/direct.h b/kernel/dma/direct.h
index 18d346118fe8..72aa65558e07 100644
--- a/kernel/dma/direct.h
+++ b/kernel/dma/direct.h
@@ -57,9 +57,11 @@  static inline void dma_direct_sync_single_for_device(struct device *dev,
 		dma_addr_t addr, size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = dma_to_phys(dev, addr);
+	struct io_tlb_pool *pool;
 
-	if (unlikely(is_swiotlb_buffer(dev, paddr)))
-		swiotlb_sync_single_for_device(dev, paddr, size, dir);
+	pool = is_swiotlb_buffer(dev, paddr);
+	if (unlikely(pool))
+		swiotlb_sync_single_for_device(dev, paddr, size, dir, pool);
 
 	if (!dev_is_dma_coherent(dev))
 		arch_sync_dma_for_device(paddr, size, dir);
@@ -69,14 +71,16 @@  static inline void dma_direct_sync_single_for_cpu(struct device *dev,
 		dma_addr_t addr, size_t size, enum dma_data_direction dir)
 {
 	phys_addr_t paddr = dma_to_phys(dev, addr);
+	struct io_tlb_pool *pool;
 
 	if (!dev_is_dma_coherent(dev)) {
 		arch_sync_dma_for_cpu(paddr, size, dir);
 		arch_sync_dma_for_cpu_all();
 	}
 
-	if (unlikely(is_swiotlb_buffer(dev, paddr)))
-		swiotlb_sync_single_for_cpu(dev, paddr, size, dir);
+	pool = is_swiotlb_buffer(dev, paddr);
+	if (unlikely(pool))
+		swiotlb_sync_single_for_cpu(dev, paddr, size, dir, pool);
 
 	if (dir == DMA_FROM_DEVICE)
 		arch_dma_mark_clean(paddr, size);
@@ -117,12 +121,14 @@  static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
 		size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
 	phys_addr_t phys = dma_to_phys(dev, addr);
+	struct io_tlb_pool *pool;
 
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC))
 		dma_direct_sync_single_for_cpu(dev, addr, size, dir);
 
-	if (unlikely(is_swiotlb_buffer(dev, phys)))
+	pool = is_swiotlb_buffer(dev, phys);
+	if (unlikely(pool))
 		swiotlb_tbl_unmap_single(dev, phys, size, dir,
-					 attrs | DMA_ATTR_SKIP_CPU_SYNC);
+					 attrs | DMA_ATTR_SKIP_CPU_SYNC, pool);
 }
 #endif /* _KERNEL_DMA_DIRECT_H */
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index fe1ccb53596f..59b3e333651d 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -855,9 +855,8 @@  static unsigned int swiotlb_align_offset(struct device *dev,
  * Bounce: copy the swiotlb buffer from or back to the original dma location
  */
 static void swiotlb_bounce(struct device *dev, phys_addr_t tlb_addr, size_t size,
-			   enum dma_data_direction dir)
+			   enum dma_data_direction dir, struct io_tlb_pool *mem)
 {
-	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
 	int index = (tlb_addr - mem->start) >> IO_TLB_SHIFT;
 	phys_addr_t orig_addr = mem->slots[index].orig_addr;
 	size_t alloc_size = mem->slots[index].alloc_size;
@@ -1435,13 +1434,13 @@  phys_addr_t swiotlb_tbl_map_single(struct device *dev, phys_addr_t orig_addr,
 	 * hardware behavior.  Use of swiotlb is supposed to be transparent,
 	 * i.e. swiotlb must not corrupt memory by clobbering unwritten bytes.
 	 */
-	swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE);
+	swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_TO_DEVICE, pool);
 	return tlb_addr;
 }
 
-static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
+static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr,
+				  struct io_tlb_pool *mem)
 {
-	struct io_tlb_pool *mem = swiotlb_find_pool(dev, tlb_addr);
 	unsigned long flags;
 	unsigned int offset = swiotlb_align_offset(dev, 0, tlb_addr);
 	int index, nslots, aindex;
@@ -1505,11 +1504,9 @@  static void swiotlb_release_slots(struct device *dev, phys_addr_t tlb_addr)
  *
  * Return: %true if @tlb_addr belonged to a transient pool that was released.
  */
-static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr)
+static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr,
+				  struct io_tlb_pool *pool)
 {
-	struct io_tlb_pool *pool;
-
-	pool = swiotlb_find_pool(dev, tlb_addr);
 	if (!pool->transient)
 		return false;
 
@@ -1522,7 +1519,8 @@  static bool swiotlb_del_transient(struct device *dev, phys_addr_t tlb_addr)
 #else  /* !CONFIG_SWIOTLB_DYNAMIC */
 
 static inline bool swiotlb_del_transient(struct device *dev,
-					 phys_addr_t tlb_addr)
+					 phys_addr_t tlb_addr,
+					 struct io_tlb_pool *pool)
 {
 	return false;
 }
@@ -1534,34 +1532,34 @@  static inline bool swiotlb_del_transient(struct device *dev,
  */
 void swiotlb_tbl_unmap_single(struct device *dev, phys_addr_t tlb_addr,
 			      size_t mapping_size, enum dma_data_direction dir,
-			      unsigned long attrs)
+			      unsigned long attrs, struct io_tlb_pool *pool)
 {
 	/*
 	 * First, sync the memory before unmapping the entry
 	 */
 	if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
 	    (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL))
-		swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE);
+		swiotlb_bounce(dev, tlb_addr, mapping_size, DMA_FROM_DEVICE, pool);
 
-	if (swiotlb_del_transient(dev, tlb_addr))
+	if (swiotlb_del_transient(dev, tlb_addr, pool))
 		return;
-	swiotlb_release_slots(dev, tlb_addr);
+	swiotlb_release_slots(dev, tlb_addr, pool);
 }
 
 void swiotlb_sync_single_for_device(struct device *dev, phys_addr_t tlb_addr,
-		size_t size, enum dma_data_direction dir)
+		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool)
 {
 	if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
-		swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE);
+		swiotlb_bounce(dev, tlb_addr, size, DMA_TO_DEVICE, pool);
 	else
 		BUG_ON(dir != DMA_FROM_DEVICE);
 }
 
 void swiotlb_sync_single_for_cpu(struct device *dev, phys_addr_t tlb_addr,
-		size_t size, enum dma_data_direction dir)
+		size_t size, enum dma_data_direction dir, struct io_tlb_pool *pool)
 {
 	if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
-		swiotlb_bounce(dev, tlb_addr, size, DMA_FROM_DEVICE);
+		swiotlb_bounce(dev, tlb_addr, size, DMA_FROM_DEVICE, pool);
 	else
 		BUG_ON(dir != DMA_TO_DEVICE);
 }
@@ -1586,7 +1584,8 @@  dma_addr_t swiotlb_map(struct device *dev, phys_addr_t paddr, size_t size,
 	dma_addr = phys_to_dma_unencrypted(dev, swiotlb_addr);
 	if (unlikely(!dma_capable(dev, dma_addr, size, true))) {
 		swiotlb_tbl_unmap_single(dev, swiotlb_addr, size, dir,
-			attrs | DMA_ATTR_SKIP_CPU_SYNC);
+			attrs | DMA_ATTR_SKIP_CPU_SYNC,
+			swiotlb_find_pool(dev, swiotlb_addr));
 		dev_WARN_ONCE(dev, 1,
 			"swiotlb addr %pad+%zu overflow (mask %llx, bus limit %llx).\n",
 			&dma_addr, size, *dev->dma_mask, dev->bus_dma_limit);
@@ -1774,11 +1773,13 @@  struct page *swiotlb_alloc(struct device *dev, size_t size)
 bool swiotlb_free(struct device *dev, struct page *page, size_t size)
 {
 	phys_addr_t tlb_addr = page_to_phys(page);
+	struct io_tlb_pool *pool;
 
-	if (!is_swiotlb_buffer(dev, tlb_addr))
+	pool = is_swiotlb_buffer(dev, tlb_addr);
+	if (!pool)
 		return false;
 
-	swiotlb_release_slots(dev, tlb_addr);
+	swiotlb_release_slots(dev, tlb_addr, pool);
 
 	return true;
 }