[v2,rdma-next,0/5] Introduce a DMA block iterator

Message ID	20190419134353.12684-1-shiraz.saleem@intel.com (mailing list archive)
Headers	show Return-Path: <linux-rdma-owner@kernel.org> From: Shiraz Saleem <shiraz.saleem@intel.com> To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, "Shiraz Saleem" <shiraz.saleem@intel.com> Subject: [PATCH v2 rdma-next 0/5] Introduce a DMA block iterator Date: Fri, 19 Apr 2019 08:43:48 -0500 Message-Id: <20190419134353.12684-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk
Series	Introduce a DMA block iterator \| expand [v2,rdma-next,0/5] Introduce a DMA block iterator [v2,rdma-next,1/5] RDMA/umem: Add API to find best driver supported page size in an MR [v2,rdma-next,2/5] RDMA/verbs: Add a DMA iterator to return aligned contiguous memory blocks [v2,rdma-next,3/5] RDMA/i40iw: Use core helpers to get aligned DMA address within a supported page … [v2,rdma-next,4/5] RDMA/bnxt_re: Use core helpers to get aligned DMA address [v2,rdma-next,5/5] RDMA/umem: Remove hugetlb flag

Shiraz Saleem April 19, 2019, 1:43 p.m. UTC

From: "Shiraz Saleem" <shiraz.saleem@intel.com>

This patch set is aiming to allow drivers to leverage a new DMA
block iterator to get contiguous aligned memory blocks within
their HW supported page sizes. The motivation for this work comes
from the discussion in [1].

The first patch introduces a new umem API that allows drivers to find a
best supported page size to use for the MR, from a bitmap of HW supported
page sizes.

The second patch introduces a new DMA block iterator that returns allows
drivers to get aligned DMA addresses within a HW supported page size.

The third patch and fouth patch removes the dependency of i40iw and bnxt_re
drivers on the hugetlb flag. The new core APIs are called in these drivers to
get huge page size aligned addresses if the MR is backed by huge pages.

The sixth patch removes the hugetlb flag from IB core.

Please note that mixed page portion of the algorithm and bnxt_re update in
patch #4 have not been tested on hardware.

[1] https://patchwork.kernel.org/patch/10499753/

RFC-->v0:
---------
* Add to scatter table by iterating a limited sized page list.
* Updated driver call sites to use the for_each_sg_page iterator
  variant where applicable.
* Tweaked algorithm in ib_umem_find_single_pg_size and ib_umem_next_phys_iter
  to ignore alignment of the start of first SGE and end of the last SGE.
* Simplified ib_umem_find_single_pg_size on offset alignments checks for
  user-space virtual and physical buffer.
* Updated ib_umem_start_phys_iter to do some pre-computation
  for the non-mixed page support case.
* Updated bnxt_re driver to use the new core APIs and remove its
  dependency on the huge tlb flag.
* Fixed a bug in computation of sg_phys_iter->phyaddr in ib_umem_next_phys_iter.
* Drop hugetlb flag usage from RDMA subsystem.
* Rebased on top of for-next.

v0-->v1:
--------
* Remove the patches that update driver to use for_each_sg_page variant
  to iterate in the SGE. This is sent as a seperate series using
  the for_each_sg_dma_page variant.
* Tweak ib_umem_add_sg_table API defintion based on maintainer feedback.
* Cache number of scatterlist entries in umem.
* Update function headers for ib_umem_find_single_pg_size and ib_umem_next_phys_iter.
* Add sanity check on supported_pgsz in ib_umem_find_single_pg_size.

v1-->v2:
--------
*Removed page combining patch as it was sent stand alone.
*__fls on pgsz_bitmap as opposed to fls64 since it's an unsigned long.
*rename ib_umem_find_pg_bit() --> rdma_find_pg_bit() and moved to ib_verbs.h
*rename ib_umem_find_single_pg_size() --> ib_umem_find_best_pgsz()
*New flag IB_UMEM_VA_BASED_OFFSET for ib_umem_find_best_pgsz API for HW that uses least significant bits
  of VA to indicate start offset into DMA list.
*rdma_find_pg_bit() logic is re-written and simplified. It can support input of 0 or 1 dma addr cases.
*ib_umem_find_best_pgsz() optimized to be less computationally expensive running rdma_find_pg_bit() only once.
*rdma_for_each_block() is the new re-designed DMA block iterator which is more in line with for_each_sg_dma_page()iterator.
*rdma_find_mixed_pg_bit() logic for interior SGE's accounting for start and end dma address. 
*remove i40iw specific enums for supported page size
*remove vma_list form ib_umem_get()

Shiraz Saleem (5):
  RDMA/umem: Add API to find best driver supported page size in an MR
  RDMA/verbs: Add a DMA iterator to return aligned contiguous memory
    blocks
  RDMA/i40iw: Use core helpers to get aligned DMA address within a
    supported page size
  RDMA/bnxt_re: Use core helpers to get aligned DMA address
  RDMA/umem: Remove hugetlb flag

 drivers/infiniband/core/umem.c            | 83 +++++++++++++++++++++----------
 drivers/infiniband/core/umem_odp.c        |  3 --
 drivers/infiniband/core/verbs.c           | 68 +++++++++++++++++++++++++
 drivers/infiniband/hw/bnxt_re/ib_verbs.c  | 27 ++++------
 drivers/infiniband/hw/i40iw/i40iw_verbs.c | 47 +++--------------
 drivers/infiniband/hw/i40iw/i40iw_verbs.h |  3 +-
 include/rdma/ib_umem.h                    | 20 +++++++-
 include/rdma/ib_verbs.h                   | 81 ++++++++++++++++++++++++++++++
 8 files changed, 245 insertions(+), 87 deletions(-)

Jason Gunthorpe April 22, 2019, 5:10 p.m. UTC | #1

On Fri, Apr 19, 2019 at 08:43:48AM -0500, Shiraz Saleem wrote:
> From: "Shiraz Saleem" <shiraz.saleem@intel.com>
> 
> This patch set is aiming to allow drivers to leverage a new DMA
> block iterator to get contiguous aligned memory blocks within
> their HW supported page sizes. The motivation for this work comes
> from the discussion in [1].
> 
> The first patch introduces a new umem API that allows drivers to find a
> best supported page size to use for the MR, from a bitmap of HW supported
> page sizes.
> 
> The second patch introduces a new DMA block iterator that returns allows
> drivers to get aligned DMA addresses within a HW supported page size.
> 
> The third patch and fouth patch removes the dependency of i40iw and bnxt_re
> drivers on the hugetlb flag. The new core APIs are called in these drivers to
> get huge page size aligned addresses if the MR is backed by huge pages.
> 
> The sixth patch removes the hugetlb flag from IB core.
> 
> Please note that mixed page portion of the algorithm and bnxt_re update in
> patch #4 have not been tested on hardware.
> 
> [1] https://patchwork.kernel.org/patch/10499753/
> 
> RFC-->v0:
> * Add to scatter table by iterating a limited sized page list.
> * Updated driver call sites to use the for_each_sg_page iterator
>   variant where applicable.
> * Tweaked algorithm in ib_umem_find_single_pg_size and ib_umem_next_phys_iter
>   to ignore alignment of the start of first SGE and end of the last SGE.
> * Simplified ib_umem_find_single_pg_size on offset alignments checks for
>   user-space virtual and physical buffer.
> * Updated ib_umem_start_phys_iter to do some pre-computation
>   for the non-mixed page support case.
> * Updated bnxt_re driver to use the new core APIs and remove its
>   dependency on the huge tlb flag.
> * Fixed a bug in computation of sg_phys_iter->phyaddr in ib_umem_next_phys_iter.
> * Drop hugetlb flag usage from RDMA subsystem.
> * Rebased on top of for-next.
> 
> v0-->v1:
> * Remove the patches that update driver to use for_each_sg_page variant
>   to iterate in the SGE. This is sent as a seperate series using
>   the for_each_sg_dma_page variant.
> * Tweak ib_umem_add_sg_table API defintion based on maintainer feedback.
> * Cache number of scatterlist entries in umem.
> * Update function headers for ib_umem_find_single_pg_size and ib_umem_next_phys_iter.
> * Add sanity check on supported_pgsz in ib_umem_find_single_pg_size.
> 
> v1-->v2:
> *Removed page combining patch as it was sent stand alone.
> *__fls on pgsz_bitmap as opposed to fls64 since it's an unsigned long.
> *rename ib_umem_find_pg_bit() --> rdma_find_pg_bit() and moved to ib_verbs.h
> *rename ib_umem_find_single_pg_size() --> ib_umem_find_best_pgsz()
> *New flag IB_UMEM_VA_BASED_OFFSET for ib_umem_find_best_pgsz API for HW that uses least significant bits
>   of VA to indicate start offset into DMA list.
> *rdma_find_pg_bit() logic is re-written and simplified. It can support input of 0 or 1 dma addr cases.
> *ib_umem_find_best_pgsz() optimized to be less computationally expensive running rdma_find_pg_bit() only once.
> *rdma_for_each_block() is the new re-designed DMA block iterator which is more in line with for_each_sg_dma_page()iterator.
> *rdma_find_mixed_pg_bit() logic for interior SGE's accounting for start and end dma address. 
> *remove i40iw specific enums for supported page size
> *remove vma_list form ib_umem_get()

Gal? Does this work for you now?

At this point this only impacts two drivers that are presumably tested
by their authors, so I'd like to merge it to finally get rid of the
hugetlb flag.. But EFA will need to use it too

Jason

Gal Pressman April 22, 2019, 6:33 p.m. UTC | #2

On 22-Apr-19 20:10, Jason Gunthorpe wrote:
> On Fri, Apr 19, 2019 at 08:43:48AM -0500, Shiraz Saleem wrote:
>> From: "Shiraz Saleem" <shiraz.saleem@intel.com>
>>
>> This patch set is aiming to allow drivers to leverage a new DMA
>> block iterator to get contiguous aligned memory blocks within
>> their HW supported page sizes. The motivation for this work comes
>> from the discussion in [1].
>>
>> The first patch introduces a new umem API that allows drivers to find a
>> best supported page size to use for the MR, from a bitmap of HW supported
>> page sizes.
>>
>> The second patch introduces a new DMA block iterator that returns allows
>> drivers to get aligned DMA addresses within a HW supported page size.
>>
>> The third patch and fouth patch removes the dependency of i40iw and bnxt_re
>> drivers on the hugetlb flag. The new core APIs are called in these drivers to
>> get huge page size aligned addresses if the MR is backed by huge pages.
>>
>> The sixth patch removes the hugetlb flag from IB core.
>>
>> Please note that mixed page portion of the algorithm and bnxt_re update in
>> patch #4 have not been tested on hardware.
>>
>> [1] https://patchwork.kernel.org/patch/10499753/
>>
>> RFC-->v0:
>> * Add to scatter table by iterating a limited sized page list.
>> * Updated driver call sites to use the for_each_sg_page iterator
>>   variant where applicable.
>> * Tweaked algorithm in ib_umem_find_single_pg_size and ib_umem_next_phys_iter
>>   to ignore alignment of the start of first SGE and end of the last SGE.
>> * Simplified ib_umem_find_single_pg_size on offset alignments checks for
>>   user-space virtual and physical buffer.
>> * Updated ib_umem_start_phys_iter to do some pre-computation
>>   for the non-mixed page support case.
>> * Updated bnxt_re driver to use the new core APIs and remove its
>>   dependency on the huge tlb flag.
>> * Fixed a bug in computation of sg_phys_iter->phyaddr in ib_umem_next_phys_iter.
>> * Drop hugetlb flag usage from RDMA subsystem.
>> * Rebased on top of for-next.
>>
>> v0-->v1:
>> * Remove the patches that update driver to use for_each_sg_page variant
>>   to iterate in the SGE. This is sent as a seperate series using
>>   the for_each_sg_dma_page variant.
>> * Tweak ib_umem_add_sg_table API defintion based on maintainer feedback.
>> * Cache number of scatterlist entries in umem.
>> * Update function headers for ib_umem_find_single_pg_size and ib_umem_next_phys_iter.
>> * Add sanity check on supported_pgsz in ib_umem_find_single_pg_size.
>>
>> v1-->v2:
>> *Removed page combining patch as it was sent stand alone.
>> *__fls on pgsz_bitmap as opposed to fls64 since it's an unsigned long.
>> *rename ib_umem_find_pg_bit() --> rdma_find_pg_bit() and moved to ib_verbs.h
>> *rename ib_umem_find_single_pg_size() --> ib_umem_find_best_pgsz()
>> *New flag IB_UMEM_VA_BASED_OFFSET for ib_umem_find_best_pgsz API for HW that uses least significant bits
>>   of VA to indicate start offset into DMA list.
>> *rdma_find_pg_bit() logic is re-written and simplified. It can support input of 0 or 1 dma addr cases.
>> *ib_umem_find_best_pgsz() optimized to be less computationally expensive running rdma_find_pg_bit() only once.
>> *rdma_for_each_block() is the new re-designed DMA block iterator which is more in line with for_each_sg_dma_page()iterator.
>> *rdma_find_mixed_pg_bit() logic for interior SGE's accounting for start and end dma address. 
>> *remove i40iw specific enums for supported page size
>> *remove vma_list form ib_umem_get()
> 
> Gal? Does this work for you now?
> 
> At this point this only impacts two drivers that are presumably tested
> by their authors, so I'd like to merge it to finally get rid of the
> hugetlb flag.. But EFA will need to use it too

I'm still running some tests internally, but AFAICT, everything works fine.
There are a few use-cases where the return values differ, but nothing breaks so
I'm fine with using Shiraz's work.

The DMA iterator is really helpful, there are two different occurrences where I
use it in EFA.
I'll send a separate patch for the EFA bits and my Tested-by when I finish my
testing (and some cleanups to the patch).

Regarding the only two drivers impact, I'm pretty sure mlx5_ib_cont_pages()
should be replaced with Shiraz's work as well, right? That would provide more
testing confidence.

BTW, please make sure to CC me if I'm supposed to chime in on the discussion :),
thanks!

Shiraz Saleem April 22, 2019, 6:43 p.m. UTC | #3

>Subject: Re: [PATCH v2 rdma-next 0/5] Introduce a DMA block iterator
>
>On Fri, Apr 19, 2019 at 08:43:48AM -0500, Shiraz Saleem wrote:
>> From: "Shiraz Saleem" <shiraz.saleem@intel.com>
>>
>> This patch set is aiming to allow drivers to leverage a new DMA block
>> iterator to get contiguous aligned memory blocks within their HW
>> supported page sizes. The motivation for this work comes from the
>> discussion in [1].
>>
>> The first patch introduces a new umem API that allows drivers to find
>> a best supported page size to use for the MR, from a bitmap of HW
>> supported page sizes.
>>
>> The second patch introduces a new DMA block iterator that returns
>> allows drivers to get aligned DMA addresses within a HW supported page size.
>>
>> The third patch and fouth patch removes the dependency of i40iw and
>> bnxt_re drivers on the hugetlb flag. The new core APIs are called in
>> these drivers to get huge page size aligned addresses if the MR is backed by
>huge pages.
>>
>> The sixth patch removes the hugetlb flag from IB core.
>>
>> Please note that mixed page portion of the algorithm and bnxt_re
>> update in patch #4 have not been tested on hardware.
>>
>> [1] https://patchwork.kernel.org/patch/10499753/
>>
>> RFC-->v0:
>> * Add to scatter table by iterating a limited sized page list.
>> * Updated driver call sites to use the for_each_sg_page iterator
>>   variant where applicable.
>> * Tweaked algorithm in ib_umem_find_single_pg_size and
>ib_umem_next_phys_iter
>>   to ignore alignment of the start of first SGE and end of the last SGE.
>> * Simplified ib_umem_find_single_pg_size on offset alignments checks for
>>   user-space virtual and physical buffer.
>> * Updated ib_umem_start_phys_iter to do some pre-computation
>>   for the non-mixed page support case.
>> * Updated bnxt_re driver to use the new core APIs and remove its
>>   dependency on the huge tlb flag.
>> * Fixed a bug in computation of sg_phys_iter->phyaddr in
>ib_umem_next_phys_iter.
>> * Drop hugetlb flag usage from RDMA subsystem.
>> * Rebased on top of for-next.
>>
>> v0-->v1:
>> * Remove the patches that update driver to use for_each_sg_page variant
>>   to iterate in the SGE. This is sent as a seperate series using
>>   the for_each_sg_dma_page variant.
>> * Tweak ib_umem_add_sg_table API defintion based on maintainer feedback.
>> * Cache number of scatterlist entries in umem.
>> * Update function headers for ib_umem_find_single_pg_size and
>ib_umem_next_phys_iter.
>> * Add sanity check on supported_pgsz in ib_umem_find_single_pg_size.
>>
>> v1-->v2:
>> *Removed page combining patch as it was sent stand alone.
>> *__fls on pgsz_bitmap as opposed to fls64 since it's an unsigned long.
>> *rename ib_umem_find_pg_bit() --> rdma_find_pg_bit() and moved to
>> ib_verbs.h *rename ib_umem_find_single_pg_size() -->
>> ib_umem_find_best_pgsz() *New flag IB_UMEM_VA_BASED_OFFSET for
>ib_umem_find_best_pgsz API for HW that uses least significant bits
>>   of VA to indicate start offset into DMA list.
>> *rdma_find_pg_bit() logic is re-written and simplified. It can support input of 0 or 1
>dma addr cases.
>> *ib_umem_find_best_pgsz() optimized to be less computationally expensive
>running rdma_find_pg_bit() only once.
>> *rdma_for_each_block() is the new re-designed DMA block iterator which is more
>in line with for_each_sg_dma_page()iterator.
>> *rdma_find_mixed_pg_bit() logic for interior SGE's accounting for start and end
>dma address.
>> *remove i40iw specific enums for supported page size *remove vma_list
>> form ib_umem_get()
>
>Gal? Does this work for you now?
>
>At this point this only impacts two drivers that are presumably tested by their
>authors, so I'd like to merge it to finally get rid of the hugetlb flag.. But EFA will
>need to use it too
>

Selvin - It would be good if you could retest this version of the series with bnxt_re too since
there were some design changes to core algorithms.

Shiraz

Jason Gunthorpe April 22, 2019, 7:30 p.m. UTC | #4

On Mon, Apr 22, 2019 at 09:33:54PM +0300, Gal Pressman wrote:
> The DMA iterator is really helpful, there are two different occurrences where I
> use it in EFA.
> I'll send a separate patch for the EFA bits and my Tested-by when I finish my
> testing (and some cleanups to the patch).
> 
> Regarding the only two drivers impact, I'm pretty sure mlx5_ib_cont_pages()
> should be replaced with Shiraz's work as well, right? 

Yes, something like that. At the moment I'm mostly interested in
getting rid of the hugetlb flag, and Shiraz is trying to add huge page
support to i40iw..

I am hoping other drivers will eventually use this API as well, as I
think most of them would be better off.

Jason

Selvin Xavier April 23, 2019, 8:39 a.m. UTC | #5

On Tue, Apr 23, 2019 at 12:13 AM Saleem, Shiraz <shiraz.saleem@intel.com> wrote:
>
> >Subject: Re: [PATCH v2 rdma-next 0/5] Introduce a DMA block iterator
> >
> >On Fri, Apr 19, 2019 at 08:43:48AM -0500, Shiraz Saleem wrote:
> >> From: "Shiraz Saleem" <shiraz.saleem@intel.com>
> >>
> >> This patch set is aiming to allow drivers to leverage a new DMA block
> >> iterator to get contiguous aligned memory blocks within their HW
> >> supported page sizes. The motivation for this work comes from the
> >> discussion in [1].
> >>
> >> The first patch introduces a new umem API that allows drivers to find
> >> a best supported page size to use for the MR, from a bitmap of HW
> >> supported page sizes.
> >>
> >> The second patch introduces a new DMA block iterator that returns
> >> allows drivers to get aligned DMA addresses within a HW supported page size.
> >>
> >> The third patch and fouth patch removes the dependency of i40iw and
> >> bnxt_re drivers on the hugetlb flag. The new core APIs are called in
> >> these drivers to get huge page size aligned addresses if the MR is backed by
> >huge pages.
> >>
> >> The sixth patch removes the hugetlb flag from IB core.
> >>
> >> Please note that mixed page portion of the algorithm and bnxt_re
> >> update in patch #4 have not been tested on hardware.
> >>
> >> [1] https://patchwork.kernel.org/patch/10499753/
> >>
> >> RFC-->v0:
> >> * Add to scatter table by iterating a limited sized page list.
> >> * Updated driver call sites to use the for_each_sg_page iterator
> >>   variant where applicable.
> >> * Tweaked algorithm in ib_umem_find_single_pg_size and
> >ib_umem_next_phys_iter
> >>   to ignore alignment of the start of first SGE and end of the last SGE.
> >> * Simplified ib_umem_find_single_pg_size on offset alignments checks for
> >>   user-space virtual and physical buffer.
> >> * Updated ib_umem_start_phys_iter to do some pre-computation
> >>   for the non-mixed page support case.
> >> * Updated bnxt_re driver to use the new core APIs and remove its
> >>   dependency on the huge tlb flag.
> >> * Fixed a bug in computation of sg_phys_iter->phyaddr in
> >ib_umem_next_phys_iter.
> >> * Drop hugetlb flag usage from RDMA subsystem.
> >> * Rebased on top of for-next.
> >>
> >> v0-->v1:
> >> * Remove the patches that update driver to use for_each_sg_page variant
> >>   to iterate in the SGE. This is sent as a seperate series using
> >>   the for_each_sg_dma_page variant.
> >> * Tweak ib_umem_add_sg_table API defintion based on maintainer feedback.
> >> * Cache number of scatterlist entries in umem.
> >> * Update function headers for ib_umem_find_single_pg_size and
> >ib_umem_next_phys_iter.
> >> * Add sanity check on supported_pgsz in ib_umem_find_single_pg_size.
> >>
> >> v1-->v2:
> >> *Removed page combining patch as it was sent stand alone.
> >> *__fls on pgsz_bitmap as opposed to fls64 since it's an unsigned long.
> >> *rename ib_umem_find_pg_bit() --> rdma_find_pg_bit() and moved to
> >> ib_verbs.h *rename ib_umem_find_single_pg_size() -->
> >> ib_umem_find_best_pgsz() *New flag IB_UMEM_VA_BASED_OFFSET for
> >ib_umem_find_best_pgsz API for HW that uses least significant bits
> >>   of VA to indicate start offset into DMA list.
> >> *rdma_find_pg_bit() logic is re-written and simplified. It can support input of 0 or 1
> >dma addr cases.
> >> *ib_umem_find_best_pgsz() optimized to be less computationally expensive
> >running rdma_find_pg_bit() only once.
> >> *rdma_for_each_block() is the new re-designed DMA block iterator which is more
> >in line with for_each_sg_dma_page()iterator.
> >> *rdma_find_mixed_pg_bit() logic for interior SGE's accounting for start and end
> >dma address.
> >> *remove i40iw specific enums for supported page size *remove vma_list
> >> form ib_umem_get()
> >
> >Gal? Does this work for you now?
> >
> >At this point this only impacts two drivers that are presumably tested by their
> >authors, so I'd like to merge it to finally get rid of the hugetlb flag.. But EFA will
> >need to use it too
> >
>
> Selvin - It would be good if you could retest this version of the series with bnxt_re too since
> there were some design changes to core algorithms.
>
> Shiraz
>
>

Series tested with bnxt_re. Looks good with my testing.

Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>

Gal Pressman April 23, 2019, 1:28 p.m. UTC | #6

On 19-Apr-19 16:43, Shiraz Saleem wrote:
> From: "Shiraz Saleem" <shiraz.saleem@intel.com>
> 
> This patch set is aiming to allow drivers to leverage a new DMA
> block iterator to get contiguous aligned memory blocks within
> their HW supported page sizes. The motivation for this work comes
> from the discussion in [1].
> 
> The first patch introduces a new umem API that allows drivers to find a
> best supported page size to use for the MR, from a bitmap of HW supported
> page sizes.
> 
> The second patch introduces a new DMA block iterator that returns allows
> drivers to get aligned DMA addresses within a HW supported page size.
> 
> The third patch and fouth patch removes the dependency of i40iw and bnxt_re
> drivers on the hugetlb flag. The new core APIs are called in these drivers to
> get huge page size aligned addresses if the MR is backed by huge pages.
> 
> The sixth patch removes the hugetlb flag from IB core.
> 
> Please note that mixed page portion of the algorithm and bnxt_re update in
> patch #4 have not been tested on hardware.
> 
> [1] https://patchwork.kernel.org/patch/10499753/
> 
> RFC-->v0:
> ---------
> * Add to scatter table by iterating a limited sized page list.
> * Updated driver call sites to use the for_each_sg_page iterator
>   variant where applicable.
> * Tweaked algorithm in ib_umem_find_single_pg_size and ib_umem_next_phys_iter
>   to ignore alignment of the start of first SGE and end of the last SGE.
> * Simplified ib_umem_find_single_pg_size on offset alignments checks for
>   user-space virtual and physical buffer.
> * Updated ib_umem_start_phys_iter to do some pre-computation
>   for the non-mixed page support case.
> * Updated bnxt_re driver to use the new core APIs and remove its
>   dependency on the huge tlb flag.
> * Fixed a bug in computation of sg_phys_iter->phyaddr in ib_umem_next_phys_iter.
> * Drop hugetlb flag usage from RDMA subsystem.
> * Rebased on top of for-next.
> 
> v0-->v1:
> --------
> * Remove the patches that update driver to use for_each_sg_page variant
>   to iterate in the SGE. This is sent as a seperate series using
>   the for_each_sg_dma_page variant.
> * Tweak ib_umem_add_sg_table API defintion based on maintainer feedback.
> * Cache number of scatterlist entries in umem.
> * Update function headers for ib_umem_find_single_pg_size and ib_umem_next_phys_iter.
> * Add sanity check on supported_pgsz in ib_umem_find_single_pg_size.
> 
> v1-->v2:
> --------
> *Removed page combining patch as it was sent stand alone.
> *__fls on pgsz_bitmap as opposed to fls64 since it's an unsigned long.
> *rename ib_umem_find_pg_bit() --> rdma_find_pg_bit() and moved to ib_verbs.h
> *rename ib_umem_find_single_pg_size() --> ib_umem_find_best_pgsz()
> *New flag IB_UMEM_VA_BASED_OFFSET for ib_umem_find_best_pgsz API for HW that uses least significant bits
>   of VA to indicate start offset into DMA list.
> *rdma_find_pg_bit() logic is re-written and simplified. It can support input of 0 or 1 dma addr cases.
> *ib_umem_find_best_pgsz() optimized to be less computationally expensive running rdma_find_pg_bit() only once.
> *rdma_for_each_block() is the new re-designed DMA block iterator which is more in line with for_each_sg_dma_page()iterator.
> *rdma_find_mixed_pg_bit() logic for interior SGE's accounting for start and end dma address. 
> *remove i40iw specific enums for supported page size
> *remove vma_list form ib_umem_get()
> 
> Shiraz Saleem (5):
>   RDMA/umem: Add API to find best driver supported page size in an MR
>   RDMA/verbs: Add a DMA iterator to return aligned contiguous memory
>     blocks
>   RDMA/i40iw: Use core helpers to get aligned DMA address within a
>     supported page size
>   RDMA/bnxt_re: Use core helpers to get aligned DMA address
>   RDMA/umem: Remove hugetlb flag
> 
>  drivers/infiniband/core/umem.c            | 83 +++++++++++++++++++++----------
>  drivers/infiniband/core/umem_odp.c        |  3 --
>  drivers/infiniband/core/verbs.c           | 68 +++++++++++++++++++++++++
>  drivers/infiniband/hw/bnxt_re/ib_verbs.c  | 27 ++++------
>  drivers/infiniband/hw/i40iw/i40iw_verbs.c | 47 +++--------------
>  drivers/infiniband/hw/i40iw/i40iw_verbs.h |  3 +-
>  include/rdma/ib_umem.h                    | 20 +++++++-
>  include/rdma/ib_verbs.h                   | 81 ++++++++++++++++++++++++++++++
>  8 files changed, 245 insertions(+), 87 deletions(-)
> 

Tested the series with EFA, looks good.

Tested-by: Gal Pressman <galpress@amazon.com>

Shiraz Saleem April 25, 2019, 1:38 p.m. UTC | #7

>Subject: Re: [PATCH v2 rdma-next 0/5] Introduce a DMA block iterator
>
>On Tue, Apr 23, 2019 at 12:13 AM Saleem, Shiraz <shiraz.saleem@intel.com> wrote:
>>
>> >Subject: Re: [PATCH v2 rdma-next 0/5] Introduce a DMA block iterator
>> >
>> >On Fri, Apr 19, 2019 at 08:43:48AM -0500, Shiraz Saleem wrote:
>> >> From: "Shiraz Saleem" <shiraz.saleem@intel.com>
>> >>
>> >> This patch set is aiming to allow drivers to leverage a new DMA
>> >> block iterator to get contiguous aligned memory blocks within their
>> >> HW supported page sizes. The motivation for this work comes from
>> >> the discussion in [1].
>> >>
>> >> The first patch introduces a new umem API that allows drivers to
>> >> find a best supported page size to use for the MR, from a bitmap of
>> >> HW supported page sizes.
>> >>
>> >> The second patch introduces a new DMA block iterator that returns
>> >> allows drivers to get aligned DMA addresses within a HW supported page size.
>> >>
>> >> The third patch and fouth patch removes the dependency of i40iw and
>> >> bnxt_re drivers on the hugetlb flag. The new core APIs are called
>> >> in these drivers to get huge page size aligned addresses if the MR
>> >> is backed by
>> >huge pages.
>> >>
>> >> The sixth patch removes the hugetlb flag from IB core.
>> >>
>> >> Please note that mixed page portion of the algorithm and bnxt_re
>> >> update in patch #4 have not been tested on hardware.
>> >>
>> >> [1] https://patchwork.kernel.org/patch/10499753/

[.....]

>Series tested with bnxt_re. Looks good with my testing.
>
>Tested-by: Selvin Xavier <selvin.xavier@broadcom.com>

Thanks Selvin!

Shiraz Saleem April 25, 2019, 1:42 p.m. UTC | #8

>Subject: Re: [PATCH v2 rdma-next 0/5] Introduce a DMA block iterator
>
>On 19-Apr-19 16:43, Shiraz Saleem wrote:
>> From: "Shiraz Saleem" <shiraz.saleem@intel.com>
>>
>> This patch set is aiming to allow drivers to leverage a new DMA block
>> iterator to get contiguous aligned memory blocks within their HW
>> supported page sizes. The motivation for this work comes from the
>> discussion in [1].
>>
>> The first patch introduces a new umem API that allows drivers to find
>> a best supported page size to use for the MR, from a bitmap of HW
>> supported page sizes.
>>
>> The second patch introduces a new DMA block iterator that returns
>> allows drivers to get aligned DMA addresses within a HW supported page size.
>>
>> The third patch and fouth patch removes the dependency of i40iw and
>> bnxt_re drivers on the hugetlb flag. The new core APIs are called in
>> these drivers to get huge page size aligned addresses if the MR is backed by huge
>pages.
>>
>> The sixth patch removes the hugetlb flag from IB core.
>>
>> Please note that mixed page portion of the algorithm and bnxt_re
>> update in patch #4 have not been tested on hardware.
>>
>> [1] https://patchwork.kernel.org/patch/10499753/
>>

[.....]

>
>Tested the series with EFA, looks good.
>
>Tested-by: Gal Pressman <galpress@amazon.com>

Thanks Gal!

[v2,rdma-next,0/5] Introduce a DMA block iterator

Message

Comments