From patchwork Fri Oct 19 23:34:06 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10650225 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 507C51508 for ; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 406BA28531 for ; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 354B72853A; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0B64428538 for ; Fri, 19 Oct 2018 23:34:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726349AbeJTHml (ORCPT ); Sat, 20 Oct 2018 03:42:41 -0400 Received: from mga17.intel.com ([192.55.52.151]:43450 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726881AbeJTHml (ORCPT ); Sat, 20 Oct 2018 03:42:41 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Oct 2018 16:34:30 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,401,1534834800"; d="scan'208";a="82914368" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.33.88]) by orsmga008.jf.intel.com with ESMTP; 19 Oct 2018 16:34:30 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH RFC 1/4] RDMA/umem: Minimize SG table entries Date: Fri, 19 Oct 2018 18:34:06 -0500 Message-Id: <20181019233409.1104-2-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181019233409.1104-1-shiraz.saleem@intel.com> References: <20181019233409.1104-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Squash contiguous regions of PAGE_SIZE pages into a single SG entry as opposed to one SG entry per page. This reduces the SG table size and is friendliest to the IOMMU. Suggested-by: Jason Gunthorpe Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 66 ++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 35 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index c6144df..486d6d7 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include "uverbs.h" @@ -46,18 +47,16 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty) { - struct scatterlist *sg; + struct sg_page_iter sg_iter; struct page *page; - int i; if (umem->nmap > 0) ib_dma_unmap_sg(dev, umem->sg_head.sgl, - umem->npages, + umem->sg_head.orig_nents, DMA_BIDIRECTIONAL); - for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) { - - page = sg_page(sg); + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->sg_head.orig_nents, 0) { + page = sg_page_iter_page(&sg_iter); if (!PageDirty(page) && umem->writable && dirty) set_page_dirty_lock(page); put_page(page); @@ -92,7 +91,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, int ret; int i; unsigned long dma_attrs = 0; - struct scatterlist *sg, *sg_list_start; unsigned int gup_flags = FOLL_WRITE; if (dmasync) @@ -138,7 +136,13 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, /* We assume the memory is from hugetlb until proved otherwise */ umem->hugetlb = 1; - page_list = (struct page **) __get_free_page(GFP_KERNEL); + npages = ib_umem_num_pages(umem); + if (npages == 0 || npages > UINT_MAX) { + ret = -EINVAL; + goto umem_kfree; + } + + page_list = kmalloc_array(npages, sizeof(*page_list), GFP_KERNEL); if (!page_list) { ret = -ENOMEM; goto umem_kfree; @@ -152,12 +156,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, if (!vma_list) umem->hugetlb = 0; - npages = ib_umem_num_pages(umem); - if (npages == 0 || npages > UINT_MAX) { - ret = -EINVAL; - goto out; - } - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; down_write(&mm->mmap_sem); @@ -172,50 +170,48 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, cur_base = addr & PAGE_MASK; - ret = sg_alloc_table(&umem->sg_head, npages, GFP_KERNEL); - if (ret) - goto vma; - if (!umem->writable) gup_flags |= FOLL_FORCE; - sg_list_start = umem->sg_head.sgl; - while (npages) { down_read(&mm->mmap_sem); ret = get_user_pages_longterm(cur_base, min_t(unsigned long, npages, PAGE_SIZE / sizeof (struct page *)), - gup_flags, page_list, vma_list); + gup_flags, page_list + umem->npages, vma_list); if (ret < 0) { up_read(&mm->mmap_sem); - goto umem_release; + release_pages(page_list, umem->npages); + goto vma; } umem->npages += ret; cur_base += ret * PAGE_SIZE; npages -= ret; - /* Continue to hold the mmap_sem as vma_list access - * needs to be protected. - */ - for_each_sg(sg_list_start, sg, ret, i) { + for(i = 0; i < ret && umem->hugetlb; i++) { if (vma_list && !is_vm_hugetlb_page(vma_list[i])) umem->hugetlb = 0; - - sg_set_page(sg, page_list[i], PAGE_SIZE, 0); } up_read(&mm->mmap_sem); + } - /* preparing for next loop */ - sg_list_start = sg; + ret = sg_alloc_table_from_pages(&umem->sg_head, + page_list, + umem->npages, + 0, + umem->npages << PAGE_SHIFT, + GFP_KERNEL); + if (ret) { + release_pages(page_list, umem->npages); + goto vma; } umem->nmap = ib_dma_map_sg_attrs(context->device, - umem->sg_head.sgl, - umem->npages, - DMA_BIDIRECTIONAL, - dma_attrs); + umem->sg_head.sgl, + umem->sg_head.orig_nents, + DMA_BIDIRECTIONAL, + dma_attrs); if (!umem->nmap) { ret = -ENOMEM; @@ -234,7 +230,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, out: if (vma_list) free_page((unsigned long) vma_list); - free_page((unsigned long) page_list); + kfree(page_list); umem_kfree: if (ret) { mmdrop(umem->owning_mm); From patchwork Fri Oct 19 23:34:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10650227 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7292290 for ; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6234028531 for ; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56DF028538; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D61C2853E for ; Fri, 19 Oct 2018 23:34:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727044AbeJTHmp (ORCPT ); Sat, 20 Oct 2018 03:42:45 -0400 Received: from mga17.intel.com ([192.55.52.151]:43450 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726881AbeJTHmp (ORCPT ); Sat, 20 Oct 2018 03:42:45 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Oct 2018 16:34:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,401,1534834800"; d="scan'208";a="82914373" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.33.88]) by orsmga008.jf.intel.com with ESMTP; 19 Oct 2018 16:34:30 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH RFC 2/4] RDMA/umem: Add API to find best driver supported page size in an MR Date: Fri, 19 Oct 2018 18:34:07 -0500 Message-Id: <20181019233409.1104-3-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181019233409.1104-1-shiraz.saleem@intel.com> References: <20181019233409.1104-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This helper iterates through the SG list to find the best page size to use from a bitmap of HW supported page sizes. Drivers that support multiple page sizes, but not mixed pages in an MR can call this API. Suggested-by: Jason Gunthorpe Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 95 ++++++++++++++++++++++++++++++++++++++++++ include/rdma/ib_umem.h | 7 ++++ 2 files changed, 102 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 486d6d7..04071b5 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -66,6 +66,101 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d } /** + * ib_umem_find_pg_bit - Find the page bit to use for phyaddr + * + * @phyaddr: Physical address after DMA translation + * @supported_pgsz: bitmask of HW supported page sizes + */ +static int ib_umem_find_pg_bit(unsigned long phyaddr, + unsigned long supported_pgsz) +{ + unsigned long num_zeroes; + int pg_bit; + + /* Trailing zero bits in the address */ + num_zeroes = __ffs(phyaddr); + + /* Find page bit such that phyaddr is aligned to the highest supported + * HW page size + */ + pg_bit = fls64(supported_pgsz & (BIT_ULL(num_zeroes + 1) - 1)) - 1; + + return pg_bit; +} + +/** + * ib_umem_find_single_pg_size - Find best HW page size to use for this MR + * @umem: umem struct + * @supported_pgsz: bitmask of HW supported page sizes + * + * This helper is intended for HW that support multiple page + * sizes but can do only a single page size in an MR. + */ +unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz) +{ + struct scatterlist *sg; + unsigned long dma_addr_start, dma_addr_end; + unsigned long uvirt_offset, phy_offset; + unsigned long pg_mask, bitmap; + int pg_bit_start, pg_bit_end, pg_bit_sg_chunk; + int lowest_pg_bit, best_pg_bit; + int i; + + if (!supported_pgsz) + return 0; + + lowest_pg_bit = __ffs(supported_pgsz); + best_pg_bit = fls64(supported_pgsz) - 1; + + for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.orig_nents, i) { + dma_addr_start = sg_dma_address(sg); + dma_addr_end = sg_dma_address(sg) + sg_dma_len(sg); + pg_bit_start = ib_umem_find_pg_bit(dma_addr_start, supported_pgsz); + pg_bit_end = ib_umem_find_pg_bit(dma_addr_end, supported_pgsz); + + if (!i) { + pg_bit_sg_chunk = max_t(int, pg_bit_start, pg_bit_end); + bitmap = supported_pgsz; + /* The start offset of the MR into a first _large_ page + * should line up exactly for the user-space virtual buf + * and physical buffer, in order to upgrade the page bit + */ + while (pg_bit_sg_chunk > PAGE_SHIFT) { + pg_mask = ~((1 << pg_bit_sg_chunk) - 1); + uvirt_offset = umem->address & ~pg_mask; + phy_offset = (dma_addr_start + ib_umem_offset(umem)) & + ~pg_mask; + if (uvirt_offset == phy_offset) + break; + + /* Retry with next supported page size */ + clear_bit(pg_bit_sg_chunk, &bitmap); + pg_bit_sg_chunk = fls64(bitmap) - 1; + } + } else if (i == (umem->sg_head.orig_nents - 1)) { + /* last SG chunk: Does not matter if MR ends at an + * unaligned offset. + */ + pg_bit_sg_chunk = pg_bit_start; + } else { + pg_bit_sg_chunk = min_t(int, pg_bit_start, pg_bit_end); + } + + best_pg_bit = min_t(int, best_pg_bit, pg_bit_sg_chunk); + if (best_pg_bit == lowest_pg_bit) + break; + } + + /* best page bit cannot be less than the lowest supported HW size */ + if (best_pg_bit < lowest_pg_bit) + return BIT_ULL(lowest_pg_bit); + + return BIT_ULL(best_pg_bit); +} +EXPORT_SYMBOL(ib_umem_find_single_pg_size); + +/** * ib_umem_get - Pin and DMA map userspace memory. * * If access flags indicate ODP memory, avoid pinning. Instead, stores diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 5d3755e..24ba6c6 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -86,6 +86,8 @@ void ib_umem_release(struct ib_umem *umem); int ib_umem_page_count(struct ib_umem *umem); int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset, size_t length); +unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz); #else /* CONFIG_INFINIBAND_USER_MEM */ @@ -102,6 +104,11 @@ static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offs size_t length) { return -EINVAL; } +static inline int ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz) { + return -EINVAL; +} + #endif /* CONFIG_INFINIBAND_USER_MEM */ #endif /* IB_UMEM_H */ From patchwork Fri Oct 19 23:34:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10650223 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D730A112B for ; Fri, 19 Oct 2018 23:34:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C73EA28531 for ; Fri, 19 Oct 2018 23:34:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB8BB28565; Fri, 19 Oct 2018 23:34:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C2652853A for ; Fri, 19 Oct 2018 23:34:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727061AbeJTHmn (ORCPT ); Sat, 20 Oct 2018 03:42:43 -0400 Received: from mga17.intel.com ([192.55.52.151]:43452 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727050AbeJTHmn (ORCPT ); Sat, 20 Oct 2018 03:42:43 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Oct 2018 16:34:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,401,1534834800"; d="scan'208";a="82914382" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.33.88]) by orsmga008.jf.intel.com with ESMTP; 19 Oct 2018 16:34:31 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH RFC 3/4] RDMA/umem: Add API to return optimal HW DMA addresses from SG list Date: Fri, 19 Oct 2018 18:34:08 -0500 Message-Id: <20181019233409.1104-4-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181019233409.1104-1-shiraz.saleem@intel.com> References: <20181019233409.1104-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This helper iterates the SG list and returns suitable HW aligned DMA addresses within a driver supported page size. The implementation is intended to work for HW that support single page sizes or mixed page sizes in an MR. This avoids the need for having driver specific algorithms to achieve the same thing and redundant walks of the SG list. Suggested-by: Jason Gunthorpe Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 68 ++++++++++++++++++++++++++++++++++++++++++ include/rdma/ib_umem.h | 19 ++++++++++++ 2 files changed, 87 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 04071b5..cba79ab 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -160,6 +160,74 @@ unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, } EXPORT_SYMBOL(ib_umem_find_single_pg_size); +void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter) +{ + memset(sg_phys_iter, 0, sizeof(struct sg_phys_iter)); + sg_phys_iter->sg = umem->sg_head.sgl; +} +EXPORT_SYMBOL(ib_umem_start_phys_iter); + +/** + * ib_umem_next_phys_iter - SG list iterator that returns aligned HW address + * @umem: umem struct + * @sg_phys_iter: SG HW address iterator + * @supported_pgsz: bitmask of HW supported page sizes + * + * This helper iterates over the SG list and returns the HW + * address aligned to a supported HW page size. + * + * The algorithm differs slightly between HW that supports single + * page sizes vs mixed page sizes in an MR. For example, if an + * MR of size 4M-4K, starts at an offset PAGE_SIZE (ex: 4K) into + * a 2M page; HW that supports multiple page sizes (ex: 4K, 2M) + * would get 511 4K pages and one 2M page. Single page support + * HW would get back two 2M pages or 1023 4K pages. + */ +bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz) +{ + unsigned long pg_mask, offset; + int pg_bit; + + if (!sg_phys_iter->sg || !supported_pgsz) + return false; + + if (sg_phys_iter->remaining) { + sg_phys_iter->phyaddr += sg_phys_iter->len; + } else { + sg_phys_iter->phyaddr = sg_dma_address(sg_phys_iter->sg); + sg_phys_iter->remaining = sg_dma_len(sg_phys_iter->sg); + } + + /* Single page support in MR */ + if (hweight_long(supported_pgsz) == 1) { + pg_bit = fls64(supported_pgsz) - 1; + } else { + /* Mixed page support in MR*/ + pg_bit = ib_umem_find_pg_bit(sg_phys_iter->phyaddr, + supported_pgsz); + } + + /* page bit cannot be less than the lowest supported HW size */ + if (WARN_ON(pg_bit < __ffs(supported_pgsz))) + return false; + + pg_mask = ~((1 << pg_bit) - 1); + + offset = sg_phys_iter->phyaddr & ~pg_mask; + sg_phys_iter->phyaddr = sg_phys_iter->phyaddr & pg_mask; + sg_phys_iter->len = min_t(int, sg_phys_iter->remaining, + (1 << (pg_bit)) - offset); + sg_phys_iter->remaining -= sg_phys_iter->len; + if (!sg_phys_iter->remaining) + sg_phys_iter->sg = sg_next(sg_phys_iter->sg); + + return true; +} +EXPORT_SYMBOL(ib_umem_next_phys_iter); + /** * ib_umem_get - Pin and DMA map userspace memory. * diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 24ba6c6..8114fd1 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -55,6 +55,13 @@ struct ib_umem { int npages; }; +struct sg_phys_iter { + struct scatterlist *sg; + unsigned long phyaddr; + size_t len; + unsigned int remaining; +}; + /* Returns the offset of the umem start relative to the first page. */ static inline int ib_umem_offset(struct ib_umem *umem) { @@ -88,6 +95,11 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset, size_t length); unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, unsigned long supported_pgsz); +void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter); +bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz); #else /* CONFIG_INFINIBAND_USER_MEM */ @@ -108,6 +120,13 @@ static inline int ib_umem_find_single_pg_size(struct ib_umem *umem, unsigned long supported_pgsz) { return -EINVAL; } +static inline void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter) { } +static inline bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz) { + return false; +} #endif /* CONFIG_INFINIBAND_USER_MEM */ From patchwork Fri Oct 19 23:34:09 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10650229 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EE05C112B for ; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DEEFB28531 for ; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D390828538; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4C8222853C for ; Fri, 19 Oct 2018 23:34:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727050AbeJTHmn (ORCPT ); Sat, 20 Oct 2018 03:42:43 -0400 Received: from mga17.intel.com ([192.55.52.151]:43452 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727044AbeJTHmn (ORCPT ); Sat, 20 Oct 2018 03:42:43 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Oct 2018 16:34:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,401,1534834800"; d="scan'208";a="82914387" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.33.88]) by orsmga008.jf.intel.com with ESMTP; 19 Oct 2018 16:34:32 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH RFC 4/4] RDMA/i40iw: Use umem APIs to retrieve optimal HW address Date: Fri, 19 Oct 2018 18:34:09 -0500 Message-Id: <20181019233409.1104-5-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181019233409.1104-1-shiraz.saleem@intel.com> References: <20181019233409.1104-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Call the core helpers to retrieve the optimal HW aligned address to use for the MR, within a supported i40iw page size. Remove code in i40iw to determine when MR is backed by 2M huge pages; which involves checking the umem->hugetlb flag and VMA inspection. The core helpers will return the 2M aligned address if the MR is backed by 2M pages. Fixes: f26c7c83395b ("i40iw: Add 2MB page support") Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/hw/i40iw/i40iw_user.h | 5 +++ drivers/infiniband/hw/i40iw/i40iw_verbs.c | 58 +++++++------------------------ 2 files changed, 17 insertions(+), 46 deletions(-) diff --git a/drivers/infiniband/hw/i40iw/i40iw_user.h b/drivers/infiniband/hw/i40iw/i40iw_user.h index b125925..09fdcee 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_user.h +++ b/drivers/infiniband/hw/i40iw/i40iw_user.h @@ -80,6 +80,11 @@ enum i40iw_device_capabilities_const { I40IW_MAX_PDS = 32768 }; +enum i40iw_supported_page_size { + I40IW_PAGE_SZ_4K = 0x00001000, + I40IW_PAGE_SZ_2M = 0x00200000 +}; + #define i40iw_handle void * #define i40iw_adapter_handle i40iw_handle #define i40iw_qp_handle i40iw_handle diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index cb2aef8..a2ecf9e 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -1371,55 +1371,22 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr *iwmr, { struct ib_umem *region = iwmr->region; struct i40iw_pbl *iwpbl = &iwmr->iwpbl; - int chunk_pages, entry, i; struct i40iw_pble_alloc *palloc = &iwpbl->pble_alloc; struct i40iw_pble_info *pinfo; - struct scatterlist *sg; - u64 pg_addr = 0; + struct sg_phys_iter sg_phys_iter; u32 idx = 0; pinfo = (level == I40IW_LEVEL_1) ? NULL : palloc->level2.leaf; - for_each_sg(region->sg_head.sgl, sg, region->nmap, entry) { - chunk_pages = sg_dma_len(sg) >> region->page_shift; - if ((iwmr->type == IW_MEMREG_TYPE_QP) && - !iwpbl->qp_mr.sq_page) - iwpbl->qp_mr.sq_page = sg_page(sg); - for (i = 0; i < chunk_pages; i++) { - pg_addr = sg_dma_address(sg) + - (i << region->page_shift); - - if ((entry + i) == 0) - *pbl = cpu_to_le64(pg_addr & iwmr->page_msk); - else if (!(pg_addr & ~iwmr->page_msk)) - *pbl = cpu_to_le64(pg_addr); - else - continue; - pbl = i40iw_next_pbl_addr(pbl, &pinfo, &idx); - } - } -} + if (iwmr->type == IW_MEMREG_TYPE_QP) + iwpbl->qp_mr.sq_page = sg_page(region->sg_head.sgl); -/** - * i40iw_set_hugetlb_params - set MR pg size and mask to huge pg values. - * @addr: virtual address - * @iwmr: mr pointer for this memory registration - */ -static void i40iw_set_hugetlb_values(u64 addr, struct i40iw_mr *iwmr) -{ - struct vm_area_struct *vma; - struct hstate *h; - - down_read(¤t->mm->mmap_sem); - vma = find_vma(current->mm, addr); - if (vma && is_vm_hugetlb_page(vma)) { - h = hstate_vma(vma); - if (huge_page_size(h) == 0x200000) { - iwmr->page_size = huge_page_size(h); - iwmr->page_msk = huge_page_mask(h); - } + for (ib_umem_start_phys_iter(region, &sg_phys_iter); + ib_umem_next_phys_iter(region, &sg_phys_iter, iwmr->page_size);) { + *pbl = cpu_to_le64(sg_phys_iter.phyaddr); + pbl = i40iw_next_pbl_addr(pbl, &pinfo, &idx); } - up_read(¤t->mm->mmap_sem); + } /** @@ -1876,11 +1843,10 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, iwmr->ibmr.device = pd->device; ucontext = to_ucontext(pd->uobject->context); - iwmr->page_size = PAGE_SIZE; - iwmr->page_msk = PAGE_MASK; - - if (region->hugetlb && (req.reg_type == IW_MEMREG_TYPE_MEM)) - i40iw_set_hugetlb_values(start, iwmr); + iwmr->page_size = I40IW_PAGE_SZ_4K; + if (req.reg_type == IW_MEMREG_TYPE_MEM) + iwmr->page_size = ib_umem_find_single_pg_size(region, + I40IW_PAGE_SZ_4K | I40IW_PAGE_SZ_2M); region_length = region->length + (start & (iwmr->page_size - 1)); pg_shift = ffs(iwmr->page_size) - 1;