Message ID | 20231129202143.1434-2-shiraz.saleem@intel.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Jason Gunthorpe |
Headers | show |
Series | Fixes for 64K page size support | expand |
On Wed, Nov 29, 2023 at 02:21:41PM -0600, Shiraz Saleem wrote: > be arbitrary, but for 64k pages, the VA may be offset by some > number of HCA 4k pages and followed by some number of HCA 4k > pages. > > The current iterator doesn't account for either the preceding > 4k pages or the following 4k pages. > > Fix the issue by extending the ib_block_iter to contain > the number of DMA pages like comment [1] says and by using > __sg_advance to start the iterator at the first live HCA page. > > The changes are contained in a parallel set of iterator start > and next functions that are umem aware and specfic to umem > since there is one user of the rdma_for_each_block() without > umem. > > These two fixes prevents the extra pages before and after the > user MR data. > > Fix the preceding pages by using the __sq_advance field to start > at the first 4k page containing MR data. > > Fix the following pages by saving the number of pgsz blocks in > the iterator state and downcounting on each next. > > This fix allows for the elimination of the small page crutch noted > in the Fixes. > > Fixes: 10c75ccb54e4 ("RDMA/umem: Prevent small pages from being returned by ib_umem_find_best_pgsz()") > Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/rdma/ib_umem.h#n91 [1] > Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> > Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> > --- > drivers/infiniband/core/umem.c | 6 ------ > include/rdma/ib_umem.h | 9 ++++++++- > include/rdma/ib_verbs.h | 1 + > 3 files changed, 9 insertions(+), 7 deletions(-) Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> I admit this fix looks so easy and simple I wonder if there was some tricky reason why I didn't do it back then.. Oh well, I've forgotten, I guess we will find out! Jason
diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index f9ab671c8eda..07c571c7b699 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -96,12 +96,6 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, return page_size; } - /* rdma_for_each_block() has a bug if the page size is smaller than the - * page size used to build the umem. For now prevent smaller page sizes - * from being returned. - */ - pgsz_bitmap &= GENMASK(BITS_PER_LONG - 1, PAGE_SHIFT); - /* The best result is the smallest page size that results in the minimum * number of required pages. Compute the largest page size that could * work based on VA address bits that don't change. diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 95896472a82b..726773747764 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -77,6 +77,13 @@ static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter, { __rdma_block_iter_start(biter, umem->sgt_append.sgt.sgl, umem->sgt_append.sgt.nents, pgsz); + biter->__sg_advance = ib_umem_offset(umem) & ~(pgsz - 1); + biter->__sg_numblocks = ib_umem_num_dma_blocks(umem, pgsz); +} + +static inline bool __rdma_umem_block_iter_next(struct ib_block_iter *biter) +{ + return __rdma_block_iter_next(biter) && biter->__sg_numblocks--; } /** @@ -92,7 +99,7 @@ static inline void __rdma_umem_block_iter_start(struct ib_block_iter *biter, */ #define rdma_umem_for_each_dma_block(umem, biter, pgsz) \ for (__rdma_umem_block_iter_start(biter, umem, pgsz); \ - __rdma_block_iter_next(biter);) + __rdma_umem_block_iter_next(biter);) #ifdef CONFIG_INFINIBAND_USER_MEM diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h index fb1a2d6b1969..b7b6b58dd348 100644 --- a/include/rdma/ib_verbs.h +++ b/include/rdma/ib_verbs.h @@ -2850,6 +2850,7 @@ struct ib_block_iter { /* internal states */ struct scatterlist *__sg; /* sg holding the current aligned block */ dma_addr_t __dma_addr; /* unaligned DMA address of this block */ + size_t __sg_numblocks; /* ib_umem_num_dma_blocks() */ unsigned int __sg_nents; /* number of SG entries */ unsigned int __sg_advance; /* number of bytes to advance in sg in next step */ unsigned int __pg_bit; /* alignment of current block */