From patchwork Mon Dec 24 22:32:22 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10742491 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 9634013BF for ; Mon, 24 Dec 2018 22:32:51 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 825D5289FB for ; Mon, 24 Dec 2018 22:32:51 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 768D728A29; Mon, 24 Dec 2018 22:32:51 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63731289FB for ; Mon, 24 Dec 2018 22:32:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725836AbeLXWct (ORCPT ); Mon, 24 Dec 2018 17:32:49 -0500 Received: from mga04.intel.com ([192.55.52.120]:63657 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725819AbeLXWct (ORCPT ); Mon, 24 Dec 2018 17:32:49 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Dec 2018 14:32:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,394,1539673200"; d="scan'208";a="286217043" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.230.58]) by orsmga005.jf.intel.com with ESMTP; 24 Dec 2018 14:32:39 -0800 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH rdma-next 1/6] RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs Date: Mon, 24 Dec 2018 16:32:22 -0600 Message-Id: <20181224223227.18016-2-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181224223227.18016-1-shiraz.saleem@intel.com> References: <20181224223227.18016-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Combine contiguous regions of PAGE_SIZE pages into single scatter list entries while adding to the scatter table. This minimizes the number of the entries in the scatter list and reduces the DMA mapping overhead, particularly with the IOMMU. Additionally, update driver call sites to use the for_each_sg_page variant to iterate through the pages in the SGE. In certain call sites, there is the assumption of single SGE to page mapping and hence the iterator variant is required. In others, it is simpler to use the iterator variant. Suggested-by: Jason Gunthorpe Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem Signed-off-by: Jason Gunthorpe --- drivers/infiniband/core/umem.c | 87 ++++++++--- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 21 +-- drivers/infiniband/hw/bnxt_re/qplib_res.c | 9 +- drivers/infiniband/hw/cxgb3/iwch_provider.c | 27 ++-- drivers/infiniband/hw/cxgb4/mem.c | 31 ++-- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 7 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 25 ++- drivers/infiniband/hw/hns/hns_roce_mr.c | 88 +++++------ drivers/infiniband/hw/i40iw/i40iw_verbs.c | 35 ++--- drivers/infiniband/hw/mthca/mthca_provider.c | 33 ++-- drivers/infiniband/hw/nes/nes_verbs.c | 203 +++++++++++-------------- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 54 +++---- drivers/infiniband/hw/qedr/verbs.c | 56 +++---- drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c | 21 +-- drivers/infiniband/sw/rdmavt/mr.c | 8 +- drivers/infiniband/sw/rxe/rxe_mr.c | 7 +- 16 files changed, 342 insertions(+), 370 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index c6144df..64bacc5 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -39,25 +39,23 @@ #include #include #include +#include #include #include "uverbs.h" - static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int dirty) { - struct scatterlist *sg; + struct sg_page_iter sg_iter; struct page *page; - int i; + int nents = sg_nents(umem->sg_head.sgl); if (umem->nmap > 0) - ib_dma_unmap_sg(dev, umem->sg_head.sgl, - umem->npages, + ib_dma_unmap_sg(dev, umem->sg_head.sgl, nents, DMA_BIDIRECTIONAL); - for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) { - - page = sg_page(sg); + for_each_sg_page(umem->sg_head.sgl, &sg_iter, nents, 0) { + page = sg_page_iter_page(&sg_iter); if (!PageDirty(page) && umem->writable && dirty) set_page_dirty_lock(page); put_page(page); @@ -66,6 +64,60 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d sg_free_table(&umem->sg_head); } +/* ib_umem_add_sg_table - Add N contiguous pages to scatter table + * + * cur: current scatterlist entry + * nxt: next scatterlist entry + * page_list: array of npage struct page pointers + * npages: number of pages in page_list + */ +static void ib_umem_add_sg_table(struct scatterlist **cur, + struct scatterlist **nxt, + struct page **page_list, + unsigned long npages) +{ + unsigned long first_pfn; + unsigned long i = 0; + unsigned long cur_sglen = 0; + bool update_cur_sg = false; + + /* Check if new page_list is contiguous with end of previous page_list*/ + if (*cur != *nxt) { + cur_sglen = (*cur)->length >> PAGE_SHIFT; + first_pfn = page_to_pfn(sg_page(*cur)); + if (first_pfn + cur_sglen == page_to_pfn(page_list[0])) + update_cur_sg = true; + } + + while (i != npages) { + unsigned long len; + struct page *first_page = page_list[i]; + + first_pfn = page_to_pfn(first_page); + + /* Compute the number of contiguous pages we have starting + * at i + */ + for (len = 0; i != npages && + first_pfn + len == page_to_pfn(page_list[i]); + i++, len++); + + /* Squash first N contiguous pages from page_list into current + * SGE + */ + if (update_cur_sg) { + sg_set_page(*cur, sg_page(*cur), + (len + cur_sglen) * PAGE_SIZE, 0); + update_cur_sg = false; + /* Squash N contiguous pages into next SGE */ + } else { + sg_set_page(*nxt, first_page, len * PAGE_SIZE, 0); + *cur = *nxt; + *nxt = sg_next(*cur); + } + } +} + /** * ib_umem_get - Pin and DMA map userspace memory. * @@ -92,7 +144,7 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, int ret; int i; unsigned long dma_attrs = 0; - struct scatterlist *sg, *sg_list_start; + struct scatterlist *sg_cur, *sg_nxt; unsigned int gup_flags = FOLL_WRITE; if (dmasync) @@ -179,7 +231,8 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, if (!umem->writable) gup_flags |= FOLL_FORCE; - sg_list_start = umem->sg_head.sgl; + sg_cur = umem->sg_head.sgl; + sg_nxt = sg_cur; while (npages) { down_read(&mm->mmap_sem); @@ -196,24 +249,24 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, cur_base += ret * PAGE_SIZE; npages -= ret; + ib_umem_add_sg_table(&sg_cur, &sg_nxt, page_list, ret); + /* Continue to hold the mmap_sem as vma_list access * needs to be protected. */ - for_each_sg(sg_list_start, sg, ret, i) { + for (i = 0; i < ret && umem->hugetlb; i++) { if (vma_list && !is_vm_hugetlb_page(vma_list[i])) umem->hugetlb = 0; - - sg_set_page(sg, page_list[i], PAGE_SIZE, 0); } - up_read(&mm->mmap_sem); - /* preparing for next loop */ - sg_list_start = sg; + up_read(&mm->mmap_sem); } + sg_mark_end(sg_cur); + umem->nmap = ib_dma_map_sg_attrs(context->device, umem->sg_head.sgl, - umem->npages, + sg_nents(umem->sg_head.sgl), DMA_BIDIRECTIONAL, dma_attrs); diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index 1e2515e..fa0cb7c 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -3537,19 +3537,14 @@ static int fill_umem_pbl_tbl(struct ib_umem *umem, u64 *pbl_tbl_orig, u64 *pbl_tbl = pbl_tbl_orig; u64 paddr; u64 page_mask = (1ULL << page_shift) - 1; - int i, pages; - struct scatterlist *sg; - int entry; - - for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { - pages = sg_dma_len(sg) >> PAGE_SHIFT; - for (i = 0; i < pages; i++) { - paddr = sg_dma_address(sg) + (i << PAGE_SHIFT); - if (pbl_tbl == pbl_tbl_orig) - *pbl_tbl++ = paddr & ~page_mask; - else if ((paddr & page_mask) == 0) - *pbl_tbl++ = paddr; - } + struct sg_page_iter sg_iter; + + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { + paddr = sg_page_iter_dma_address(&sg_iter); + if (pbl_tbl == pbl_tbl_orig) + *pbl_tbl++ = paddr & ~page_mask; + else if ((paddr & page_mask) == 0) + *pbl_tbl++ = paddr; } return pbl_tbl - pbl_tbl_orig; } diff --git a/drivers/infiniband/hw/bnxt_re/qplib_res.c b/drivers/infiniband/hw/bnxt_re/qplib_res.c index 59eeac5..2ec668f 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_res.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_res.c @@ -85,7 +85,7 @@ static void __free_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl, static int __alloc_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl, struct scatterlist *sghead, u32 pages, u32 pg_size) { - struct scatterlist *sg; + struct sg_page_iter sg_iter; bool is_umem = false; int i; @@ -116,12 +116,13 @@ static int __alloc_pbl(struct pci_dev *pdev, struct bnxt_qplib_pbl *pbl, } else { i = 0; is_umem = true; - for_each_sg(sghead, sg, pages, i) { - pbl->pg_map_arr[i] = sg_dma_address(sg); - pbl->pg_arr[i] = sg_virt(sg); + for_each_sg_page(sghead, &sg_iter, pages, 0) { + pbl->pg_map_arr[i] = sg_page_iter_dma_address(&sg_iter); + pbl->pg_arr[i] = page_address(sg_page_iter_page(&sg_iter)); if (!pbl->pg_arr[i]) goto fail; + i++; pbl->pg_count++; } } diff --git a/drivers/infiniband/hw/cxgb3/iwch_provider.c b/drivers/infiniband/hw/cxgb3/iwch_provider.c index b34b1a1..884a653 100644 --- a/drivers/infiniband/hw/cxgb3/iwch_provider.c +++ b/drivers/infiniband/hw/cxgb3/iwch_provider.c @@ -522,14 +522,13 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt, int acc, struct ib_udata *udata) { __be64 *pages; - int shift, n, len; - int i, k, entry; + int shift, n, i; int err = 0; struct iwch_dev *rhp; struct iwch_pd *php; struct iwch_mr *mhp; struct iwch_reg_user_mr_resp uresp; - struct scatterlist *sg; + struct sg_page_iter sg_iter; pr_debug("%s ib_pd %p\n", __func__, pd); php = to_iwch_pd(pd); @@ -563,19 +562,15 @@ static struct ib_mr *iwch_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, i = n = 0; - for_each_sg(mhp->umem->sg_head.sgl, sg, mhp->umem->nmap, entry) { - len = sg_dma_len(sg) >> shift; - for (k = 0; k < len; ++k) { - pages[i++] = cpu_to_be64(sg_dma_address(sg) + - (k << shift)); - if (i == PAGE_SIZE / sizeof *pages) { - err = iwch_write_pbl(mhp, pages, i, n); - if (err) - goto pbl_done; - n += i; - i = 0; - } - } + for_each_sg_page(mhp->umem->sg_head.sgl, &sg_iter, mhp->umem->nmap, 0) { + pages[i++] = cpu_to_be64(sg_page_iter_dma_address(&sg_iter)); + if (i == PAGE_SIZE / sizeof *pages) { + err = iwch_write_pbl(mhp, pages, i, n); + if (err) + goto pbl_done; + n += i; + i = 0; + } } if (i) diff --git a/drivers/infiniband/hw/cxgb4/mem.c b/drivers/infiniband/hw/cxgb4/mem.c index 7b76e6f..5806ac6 100644 --- a/drivers/infiniband/hw/cxgb4/mem.c +++ b/drivers/infiniband/hw/cxgb4/mem.c @@ -502,10 +502,9 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt, int acc, struct ib_udata *udata) { __be64 *pages; - int shift, n, len; - int i, k, entry; + int shift, n, i; int err = -ENOMEM; - struct scatterlist *sg; + struct sg_page_iter sg_iter; struct c4iw_dev *rhp; struct c4iw_pd *php; struct c4iw_mr *mhp; @@ -556,21 +555,17 @@ struct ib_mr *c4iw_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, i = n = 0; - for_each_sg(mhp->umem->sg_head.sgl, sg, mhp->umem->nmap, entry) { - len = sg_dma_len(sg) >> shift; - for (k = 0; k < len; ++k) { - pages[i++] = cpu_to_be64(sg_dma_address(sg) + - (k << shift)); - if (i == PAGE_SIZE / sizeof *pages) { - err = write_pbl(&mhp->rhp->rdev, - pages, - mhp->attr.pbl_addr + (n << 3), i, - mhp->wr_waitp); - if (err) - goto pbl_done; - n += i; - i = 0; - } + for_each_sg_page(mhp->umem->sg_head.sgl, &sg_iter, mhp->umem->nmap, 0) { + pages[i++] = cpu_to_be64(sg_page_iter_dma_address(&sg_iter)); + if (i == PAGE_SIZE / sizeof *pages) { + err = write_pbl(&mhp->rhp->rdev, + pages, + mhp->attr.pbl_addr + (n << 3), i, + mhp->wr_waitp); + if (err) + goto pbl_done; + n += i; + i = 0; } } diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c index b74c742..0bbf68f 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c @@ -1866,9 +1866,8 @@ static int hns_roce_v1_write_mtpt(void *mb_buf, struct hns_roce_mr *mr, unsigned long mtpt_idx) { struct hns_roce_v1_mpt_entry *mpt_entry; - struct scatterlist *sg; + struct sg_page_iter sg_iter; u64 *pages; - int entry; int i; /* MPT filled into mailbox buf */ @@ -1923,8 +1922,8 @@ static int hns_roce_v1_write_mtpt(void *mb_buf, struct hns_roce_mr *mr, return -ENOMEM; i = 0; - for_each_sg(mr->umem->sg_head.sgl, sg, mr->umem->nmap, entry) { - pages[i] = ((u64)sg_dma_address(sg)) >> 12; + for_each_sg_page(mr->umem->sg_head.sgl, &sg_iter, mr->umem->nmap, 0) { + pages[i] = ((u64)sg_page_iter_dma_address(&sg_iter)) >> 12; /* Directly record to MTPT table firstly 7 entry */ if (i >= HNS_ROCE_MAX_INNER_MTPT_NUM) diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c index 3a66945..1a98719 100644 --- a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c @@ -1831,12 +1831,10 @@ static int hns_roce_v2_set_mac(struct hns_roce_dev *hr_dev, u8 phy_port, static int set_mtpt_pbl(struct hns_roce_v2_mpt_entry *mpt_entry, struct hns_roce_mr *mr) { - struct scatterlist *sg; + struct sg_page_iter sg_iter; u64 page_addr; u64 *pages; - int i, j; - int len; - int entry; + int i; mpt_entry->pbl_size = cpu_to_le32(mr->pbl_size); mpt_entry->pbl_ba_l = cpu_to_le32(lower_32_bits(mr->pbl_ba >> 3)); @@ -1849,17 +1847,14 @@ static int set_mtpt_pbl(struct hns_roce_v2_mpt_entry *mpt_entry, return -ENOMEM; i = 0; - for_each_sg(mr->umem->sg_head.sgl, sg, mr->umem->nmap, entry) { - len = sg_dma_len(sg) >> PAGE_SHIFT; - for (j = 0; j < len; ++j) { - page_addr = sg_dma_address(sg) + - (j << mr->umem->page_shift); - pages[i] = page_addr >> 6; - /* Record the first 2 entry directly to MTPT table */ - if (i >= HNS_ROCE_V2_MAX_INNER_MTPT_NUM - 1) - goto found; - i++; - } + for_each_sg_page(mr->umem->sg_head.sgl, &sg_iter, mr->umem->nmap, 0) { + page_addr = sg_page_iter_dma_address(&sg_iter); + pages[i] = page_addr >> 6; + + /* Record the first 2 entry directly to MTPT table */ + if (i >= HNS_ROCE_V2_MAX_INNER_MTPT_NUM - 1) + goto found; + i++; } found: mpt_entry->pa0_l = cpu_to_le32(lower_32_bits(pages[0])); diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c b/drivers/infiniband/hw/hns/hns_roce_mr.c index ee5991b..2cf5690 100644 --- a/drivers/infiniband/hw/hns/hns_roce_mr.c +++ b/drivers/infiniband/hw/hns/hns_roce_mr.c @@ -976,12 +976,11 @@ int hns_roce_ib_umem_write_mtt(struct hns_roce_dev *hr_dev, struct hns_roce_mtt *mtt, struct ib_umem *umem) { struct device *dev = hr_dev->dev; - struct scatterlist *sg; + struct sg_page_iter sg_iter; unsigned int order; - int i, k, entry; int npage = 0; int ret = 0; - int len; + int i; u64 page_addr; u64 *pages; u32 bt_page_size; @@ -1014,29 +1013,25 @@ int hns_roce_ib_umem_write_mtt(struct hns_roce_dev *hr_dev, i = n = 0; - for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { - len = sg_dma_len(sg) >> PAGE_SHIFT; - for (k = 0; k < len; ++k) { - page_addr = - sg_dma_address(sg) + (k << umem->page_shift); - if (!(npage % (1 << (mtt->page_shift - PAGE_SHIFT)))) { - if (page_addr & ((1 << mtt->page_shift) - 1)) { - dev_err(dev, "page_addr 0x%llx is not page_shift %d alignment!\n", - page_addr, mtt->page_shift); - ret = -EINVAL; - goto out; - } - pages[i++] = page_addr; - } - npage++; - if (i == bt_page_size / sizeof(u64)) { - ret = hns_roce_write_mtt(hr_dev, mtt, n, i, - pages); - if (ret) - goto out; - n += i; - i = 0; + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { + page_addr = sg_page_iter_dma_address(&sg_iter); + if (!(npage % (1 << (mtt->page_shift - PAGE_SHIFT)))) { + if (page_addr & ((1 << mtt->page_shift) - 1)) { + dev_err(dev, "page_addr 0x%llx is not page_shift %d alignment!\n", + page_addr, mtt->page_shift); + ret = -EINVAL; + goto out; } + pages[i++] = page_addr; + } + npage++; + if (i == bt_page_size / sizeof(u64)) { + ret = hns_roce_write_mtt(hr_dev, mtt, n, i, + pages); + if (ret) + goto out; + n += i; + i = 0; } } @@ -1052,10 +1047,8 @@ static int hns_roce_ib_umem_write_mr(struct hns_roce_dev *hr_dev, struct hns_roce_mr *mr, struct ib_umem *umem) { - struct scatterlist *sg; - int i = 0, j = 0, k; - int entry; - int len; + struct sg_page_iter sg_iter; + int i = 0, j = 0; u64 page_addr; u32 pbl_bt_sz; @@ -1063,27 +1056,22 @@ static int hns_roce_ib_umem_write_mr(struct hns_roce_dev *hr_dev, return 0; pbl_bt_sz = 1 << (hr_dev->caps.pbl_ba_pg_sz + PAGE_SHIFT); - for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { - len = sg_dma_len(sg) >> PAGE_SHIFT; - for (k = 0; k < len; ++k) { - page_addr = sg_dma_address(sg) + - (k << umem->page_shift); - - if (!hr_dev->caps.pbl_hop_num) { - mr->pbl_buf[i++] = page_addr >> 12; - } else if (hr_dev->caps.pbl_hop_num == 1) { - mr->pbl_buf[i++] = page_addr; - } else { - if (hr_dev->caps.pbl_hop_num == 2) - mr->pbl_bt_l1[i][j] = page_addr; - else if (hr_dev->caps.pbl_hop_num == 3) - mr->pbl_bt_l2[i][j] = page_addr; - - j++; - if (j >= (pbl_bt_sz / 8)) { - i++; - j = 0; - } + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { + page_addr = sg_page_iter_dma_address(&sg_iter); + if (!hr_dev->caps.pbl_hop_num) { + mr->pbl_buf[i++] = page_addr >> 12; + } else if (hr_dev->caps.pbl_hop_num == 1) { + mr->pbl_buf[i++] = page_addr; + } else { + if (hr_dev->caps.pbl_hop_num == 2) + mr->pbl_bt_l1[i][j] = page_addr; + else if (hr_dev->caps.pbl_hop_num == 3) + mr->pbl_bt_l2[i][j] = page_addr; + + j++; + if (j >= (pbl_bt_sz / 8)) { + i++; + j = 0; } } } diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index 0b675b0..3e7a299 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -1369,32 +1369,29 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr *iwmr, { struct ib_umem *region = iwmr->region; struct i40iw_pbl *iwpbl = &iwmr->iwpbl; - int chunk_pages, entry, i; struct i40iw_pble_alloc *palloc = &iwpbl->pble_alloc; struct i40iw_pble_info *pinfo; - struct scatterlist *sg; + struct sg_page_iter sg_iter; u64 pg_addr = 0; u32 idx = 0; + bool first_pg = true; pinfo = (level == I40IW_LEVEL_1) ? NULL : palloc->level2.leaf; - for_each_sg(region->sg_head.sgl, sg, region->nmap, entry) { - chunk_pages = sg_dma_len(sg) >> region->page_shift; - if ((iwmr->type == IW_MEMREG_TYPE_QP) && - !iwpbl->qp_mr.sq_page) - iwpbl->qp_mr.sq_page = sg_page(sg); - for (i = 0; i < chunk_pages; i++) { - pg_addr = sg_dma_address(sg) + - (i << region->page_shift); - - if ((entry + i) == 0) - *pbl = cpu_to_le64(pg_addr & iwmr->page_msk); - else if (!(pg_addr & ~iwmr->page_msk)) - *pbl = cpu_to_le64(pg_addr); - else - continue; - pbl = i40iw_next_pbl_addr(pbl, &pinfo, &idx); - } + if (iwmr->type == IW_MEMREG_TYPE_QP) + iwpbl->qp_mr.sq_page = sg_page(region->sg_head.sgl); + + for_each_sg_page(region->sg_head.sgl, &sg_iter, region->nmap, 0) { + pg_addr = sg_page_iter_dma_address(&sg_iter); + if (first_pg) + *pbl = cpu_to_le64(pg_addr & iwmr->page_msk); + else if (!(pg_addr & ~iwmr->page_msk)) + *pbl = cpu_to_le64(pg_addr); + else + continue; + + first_pg = false; + pbl = i40iw_next_pbl_addr(pbl, &pinfo, &idx); } } diff --git a/drivers/infiniband/hw/mthca/mthca_provider.c b/drivers/infiniband/hw/mthca/mthca_provider.c index 82cb6b7..9efc989 100644 --- a/drivers/infiniband/hw/mthca/mthca_provider.c +++ b/drivers/infiniband/hw/mthca/mthca_provider.c @@ -907,12 +907,11 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u64 virt, int acc, struct ib_udata *udata) { struct mthca_dev *dev = to_mdev(pd->device); - struct scatterlist *sg; + struct sg_page_iter sg_iter; struct mthca_mr *mr; struct mthca_reg_mr ucmd; u64 *pages; - int shift, n, len; - int i, k, entry; + int shift, n, i; int err = 0; int write_mtt_size; @@ -958,21 +957,19 @@ static struct ib_mr *mthca_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, write_mtt_size = min(mthca_write_mtt_size(dev), (int) (PAGE_SIZE / sizeof *pages)); - for_each_sg(mr->umem->sg_head.sgl, sg, mr->umem->nmap, entry) { - len = sg_dma_len(sg) >> shift; - for (k = 0; k < len; ++k) { - pages[i++] = sg_dma_address(sg) + (k << shift); - /* - * Be friendly to write_mtt and pass it chunks - * of appropriate size. - */ - if (i == write_mtt_size) { - err = mthca_write_mtt(dev, mr->mtt, n, pages, i); - if (err) - goto mtt_done; - n += i; - i = 0; - } + for_each_sg_page(mr->umem->sg_head.sgl, &sg_iter, mr->umem->nmap, 0) { + pages[i++] = sg_page_iter_dma_address(&sg_iter); + + /* + * Be friendly to write_mtt and pass it chunks + * of appropriate size. + */ + if (i == write_mtt_size) { + err = mthca_write_mtt(dev, mr->mtt, n, pages, i); + if (err) + goto mtt_done; + n += i; + i = 0; } } diff --git a/drivers/infiniband/hw/nes/nes_verbs.c b/drivers/infiniband/hw/nes/nes_verbs.c index 4e7f08e..dc32b7d 100644 --- a/drivers/infiniband/hw/nes/nes_verbs.c +++ b/drivers/infiniband/hw/nes/nes_verbs.c @@ -2109,7 +2109,7 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, struct nes_device *nesdev = nesvnic->nesdev; struct nes_adapter *nesadapter = nesdev->nesadapter; struct ib_mr *ibmr = ERR_PTR(-EINVAL); - struct scatterlist *sg; + struct sg_page_iter sg_iter; struct nes_ucontext *nes_ucontext; struct nes_pbl *nespbl; struct nes_mr *nesmr; @@ -2117,10 +2117,9 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, struct nes_mem_reg_req req; struct nes_vpbl vpbl; struct nes_root_vpbl root_vpbl; - int entry, page_index; + int page_index; int page_count = 0; int err, pbl_depth = 0; - int chunk_pages; int ret; u32 stag; u32 stag_index = 0; @@ -2132,7 +2131,6 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, u16 pbl_count; u8 single_page = 1; u8 stag_key; - int first_page = 1; region = ib_umem_get(pd->uobject->context, start, length, acc, 0); if (IS_ERR(region)) { @@ -2183,18 +2181,18 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, } nesmr->region = region; - for_each_sg(region->sg_head.sgl, sg, region->nmap, entry) { - if (sg_dma_address(sg) & ~PAGE_MASK) { + for_each_sg_page(region->sg_head.sgl, &sg_iter, region->nmap, 0) { + if (sg_dma_address(sg_iter.sg) & ~PAGE_MASK) { ib_umem_release(region); nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); nes_debug(NES_DBG_MR, "Unaligned Memory Buffer: 0x%x\n", - (unsigned int) sg_dma_address(sg)); + (unsigned int)sg_dma_address(sg_iter.sg)); ibmr = ERR_PTR(-EINVAL); kfree(nesmr); goto reg_user_mr_err; } - if (!sg_dma_len(sg)) { + if (!sg_dma_len(sg_iter.sg)) { ib_umem_release(region); nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); @@ -2204,103 +2202,94 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, goto reg_user_mr_err; } - region_length += sg_dma_len(sg); - chunk_pages = sg_dma_len(sg) >> 12; + region_length += PAGE_SIZE; region_length -= skip_pages << 12; - for (page_index = skip_pages; page_index < chunk_pages; page_index++) { - skip_pages = 0; - if ((page_count != 0) && (page_count << 12) - (ib_umem_offset(region) & (4096 - 1)) >= region->length) - goto enough_pages; - if ((page_count&0x01FF) == 0) { - if (page_count >= 1024 * 512) { + skip_pages = 0; + if ((page_count != 0) && (page_count << 12) - (ib_umem_offset(region) & (4096 - 1)) >= region->length) + goto enough_pages; + if ((page_count & 0x01FF) == 0) { + if (page_count >= 1024 * 512) { + ib_umem_release(region); + nes_free_resource(nesadapter, + nesadapter->allocated_mrs, stag_index); + kfree(nesmr); + ibmr = ERR_PTR(-E2BIG); + goto reg_user_mr_err; + } + if (root_pbl_index == 1) { + root_vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, + 8192, &root_vpbl.pbl_pbase); + nes_debug(NES_DBG_MR, "Allocating root PBL, va = %p, pa = 0x%08X\n", + root_vpbl.pbl_vbase, (unsigned int)root_vpbl.pbl_pbase); + if (!root_vpbl.pbl_vbase) { ib_umem_release(region); - nes_free_resource(nesadapter, - nesadapter->allocated_mrs, stag_index); + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, + stag_index); kfree(nesmr); - ibmr = ERR_PTR(-E2BIG); + ibmr = ERR_PTR(-ENOMEM); goto reg_user_mr_err; } - if (root_pbl_index == 1) { - root_vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, - 8192, &root_vpbl.pbl_pbase); - nes_debug(NES_DBG_MR, "Allocating root PBL, va = %p, pa = 0x%08X\n", - root_vpbl.pbl_vbase, (unsigned int)root_vpbl.pbl_pbase); - if (!root_vpbl.pbl_vbase) { - ib_umem_release(region); - pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, - vpbl.pbl_pbase); - nes_free_resource(nesadapter, nesadapter->allocated_mrs, - stag_index); - kfree(nesmr); - ibmr = ERR_PTR(-ENOMEM); - goto reg_user_mr_err; - } - root_vpbl.leaf_vpbl = kcalloc(1024, - sizeof(*root_vpbl.leaf_vpbl), - GFP_KERNEL); - if (!root_vpbl.leaf_vpbl) { - ib_umem_release(region); - pci_free_consistent(nesdev->pcidev, 8192, root_vpbl.pbl_vbase, - root_vpbl.pbl_pbase); - pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, - vpbl.pbl_pbase); - nes_free_resource(nesadapter, nesadapter->allocated_mrs, - stag_index); - kfree(nesmr); - ibmr = ERR_PTR(-ENOMEM); - goto reg_user_mr_err; - } - root_vpbl.pbl_vbase[0].pa_low = - cpu_to_le32((u32)vpbl.pbl_pbase); - root_vpbl.pbl_vbase[0].pa_high = - cpu_to_le32((u32)((((u64)vpbl.pbl_pbase) >> 32))); - root_vpbl.leaf_vpbl[0] = vpbl; - } - vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 4096, - &vpbl.pbl_pbase); - nes_debug(NES_DBG_MR, "Allocating leaf PBL, va = %p, pa = 0x%08X\n", - vpbl.pbl_vbase, (unsigned int)vpbl.pbl_pbase); - if (!vpbl.pbl_vbase) { + root_vpbl.leaf_vpbl = kcalloc(1024, + sizeof(*root_vpbl.leaf_vpbl), + GFP_KERNEL); + if (!root_vpbl.leaf_vpbl) { ib_umem_release(region); - nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); - ibmr = ERR_PTR(-ENOMEM); + pci_free_consistent(nesdev->pcidev, 8192, root_vpbl.pbl_vbase, + root_vpbl.pbl_pbase); + pci_free_consistent(nesdev->pcidev, 4096, vpbl.pbl_vbase, + vpbl.pbl_pbase); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, + stag_index); kfree(nesmr); + ibmr = ERR_PTR(-ENOMEM); goto reg_user_mr_err; } - if (1 <= root_pbl_index) { - root_vpbl.pbl_vbase[root_pbl_index].pa_low = - cpu_to_le32((u32)vpbl.pbl_pbase); - root_vpbl.pbl_vbase[root_pbl_index].pa_high = - cpu_to_le32((u32)((((u64)vpbl.pbl_pbase)>>32))); - root_vpbl.leaf_vpbl[root_pbl_index] = vpbl; - } - root_pbl_index++; - cur_pbl_index = 0; + root_vpbl.pbl_vbase[0].pa_low = + cpu_to_le32((u32)vpbl.pbl_pbase); + root_vpbl.pbl_vbase[0].pa_high = + cpu_to_le32((u32)((((u64)vpbl.pbl_pbase) >> 32))); + root_vpbl.leaf_vpbl[0] = vpbl; } - if (single_page) { - if (page_count != 0) { - if ((last_dma_addr+4096) != - (sg_dma_address(sg)+ - (page_index*4096))) - single_page = 0; - last_dma_addr = sg_dma_address(sg)+ - (page_index*4096); - } else { - first_dma_addr = sg_dma_address(sg)+ - (page_index*4096); - last_dma_addr = first_dma_addr; - } + vpbl.pbl_vbase = pci_alloc_consistent(nesdev->pcidev, 4096, + &vpbl.pbl_pbase); + nes_debug(NES_DBG_MR, "Allocating leaf PBL, va = %p, pa = 0x%08X\n", + vpbl.pbl_vbase, (unsigned int)vpbl.pbl_pbase); + if (!vpbl.pbl_vbase) { + ib_umem_release(region); + nes_free_resource(nesadapter, nesadapter->allocated_mrs, stag_index); + ibmr = ERR_PTR(-ENOMEM); + kfree(nesmr); + goto reg_user_mr_err; + } + if (1 <= root_pbl_index) { + root_vpbl.pbl_vbase[root_pbl_index].pa_low = + cpu_to_le32((u32)vpbl.pbl_pbase); + root_vpbl.pbl_vbase[root_pbl_index].pa_high = + cpu_to_le32((u32)((((u64)vpbl.pbl_pbase) >> 32))); + root_vpbl.leaf_vpbl[root_pbl_index] = vpbl; + } + root_pbl_index++; + cur_pbl_index = 0; + } + if (single_page) { + if (page_count != 0) { + if ((last_dma_addr + 4096) != sg_page_iter_dma_address(&sg_iter)) + single_page = 0; + last_dma_addr = sg_page_iter_dma_address(&sg_iter); + } else { + first_dma_addr = sg_page_iter_dma_address(&sg_iter); + last_dma_addr = first_dma_addr; } - - vpbl.pbl_vbase[cur_pbl_index].pa_low = - cpu_to_le32((u32)(sg_dma_address(sg)+ - (page_index*4096))); - vpbl.pbl_vbase[cur_pbl_index].pa_high = - cpu_to_le32((u32)((((u64)(sg_dma_address(sg)+ - (page_index*4096))) >> 32))); - cur_pbl_index++; - page_count++; } + + vpbl.pbl_vbase[cur_pbl_index].pa_low = + cpu_to_le32((u32)(sg_page_iter_dma_address(&sg_iter))); + vpbl.pbl_vbase[cur_pbl_index].pa_high = + cpu_to_le32((u32)((u64)(sg_page_iter_dma_address(&sg_iter)))); + cur_pbl_index++; + page_count++; } enough_pages: @@ -2412,26 +2401,14 @@ static struct ib_mr *nes_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, nespbl->pbl_size, (unsigned long) nespbl->pbl_pbase, (void *) nespbl->pbl_vbase, nespbl->user_base); - for_each_sg(region->sg_head.sgl, sg, region->nmap, entry) { - chunk_pages = sg_dma_len(sg) >> 12; - chunk_pages += (sg_dma_len(sg) & (4096-1)) ? 1 : 0; - if (first_page) { - nespbl->page = sg_page(sg); - first_page = 0; - } - - for (page_index = 0; page_index < chunk_pages; page_index++) { - ((__le32 *)pbl)[0] = cpu_to_le32((u32) - (sg_dma_address(sg)+ - (page_index*4096))); - ((__le32 *)pbl)[1] = cpu_to_le32(((u64) - (sg_dma_address(sg)+ - (page_index*4096)))>>32); - nes_debug(NES_DBG_MR, "pbl=%p, *pbl=0x%016llx, 0x%08x%08x\n", pbl, - (unsigned long long)*pbl, - le32_to_cpu(((__le32 *)pbl)[1]), le32_to_cpu(((__le32 *)pbl)[0])); - pbl++; - } + nespbl->page = sg_page(region->sg_head.sgl); + for_each_sg_page(region->sg_head.sgl, &sg_iter, region->nmap, 0) { + ((__le32 *)pbl)[0] = cpu_to_le32((u32)(sg_page_iter_dma_address(&sg_iter))); + ((__le32 *)pbl)[1] = cpu_to_le32(((u64)(sg_page_iter_dma_address(&sg_iter)))>>32); + nes_debug(NES_DBG_MR, "pbl=%p, *pbl=0x%016llx, 0x%08x%08x\n", pbl, + (unsigned long long)*pbl, + le32_to_cpu(((__le32 *)pbl)[1]), le32_to_cpu(((__le32 *)pbl)[0])); + pbl++; } if (req.reg_type == IWNES_MEMREG_TYPE_QP) { diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c index c46bed0..05224ef 100644 --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c @@ -854,10 +854,11 @@ static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr, u32 num_pbes) { struct ocrdma_pbe *pbe; - struct scatterlist *sg; + struct sg_page_iter sg_iter; struct ocrdma_pbl *pbl_tbl = mr->hwmr.pbl_table; struct ib_umem *umem = mr->umem; - int shift, pg_cnt, pages, pbe_cnt, entry, total_num_pbes = 0; + int pbe_cnt, total_num_pbes = 0; + u64 pg_addr; if (!mr->hwmr.num_pbes) return; @@ -865,36 +866,27 @@ static void build_user_pbes(struct ocrdma_dev *dev, struct ocrdma_mr *mr, pbe = (struct ocrdma_pbe *)pbl_tbl->va; pbe_cnt = 0; - shift = umem->page_shift; - - for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { - pages = sg_dma_len(sg) >> shift; - for (pg_cnt = 0; pg_cnt < pages; pg_cnt++) { - /* store the page address in pbe */ - pbe->pa_lo = - cpu_to_le32(sg_dma_address(sg) + - (pg_cnt << shift)); - pbe->pa_hi = - cpu_to_le32(upper_32_bits(sg_dma_address(sg) + - (pg_cnt << shift))); - pbe_cnt += 1; - total_num_pbes += 1; - pbe++; - - /* if done building pbes, issue the mbx cmd. */ - if (total_num_pbes == num_pbes) - return; - - /* if the given pbl is full storing the pbes, - * move to next pbl. - */ - if (pbe_cnt == - (mr->hwmr.pbl_size / sizeof(u64))) { - pbl_tbl++; - pbe = (struct ocrdma_pbe *)pbl_tbl->va; - pbe_cnt = 0; - } + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { + /* store the page address in pbe */ + pg_addr = sg_page_iter_dma_address(&sg_iter); + pbe->pa_lo = cpu_to_le32(pg_addr); + pbe->pa_hi = cpu_to_le32(upper_32_bits(pg_addr)); + pbe_cnt += 1; + total_num_pbes += 1; + pbe++; + + /* if done building pbes, issue the mbx cmd. */ + if (total_num_pbes == num_pbes) + return; + /* if the given pbl is full storing the pbes, + * move to next pbl. + */ + if (pbe_cnt == + (mr->hwmr.pbl_size / sizeof(u64))) { + pbl_tbl++; + pbe = (struct ocrdma_pbe *)pbl_tbl->va; + pbe_cnt = 0; } } } diff --git a/drivers/infiniband/hw/qedr/verbs.c b/drivers/infiniband/hw/qedr/verbs.c index b342a70..db0f359 100644 --- a/drivers/infiniband/hw/qedr/verbs.c +++ b/drivers/infiniband/hw/qedr/verbs.c @@ -636,13 +636,12 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem, struct qedr_pbl *pbl, struct qedr_pbl_info *pbl_info, u32 pg_shift) { - int shift, pg_cnt, pages, pbe_cnt, total_num_pbes = 0; + int pbe_cnt, total_num_pbes = 0; u32 fw_pg_cnt, fw_pg_per_umem_pg; struct qedr_pbl *pbl_tbl; - struct scatterlist *sg; + struct sg_page_iter sg_iter; struct regpair *pbe; u64 pg_addr; - int entry; if (!pbl_info->num_pbes) return; @@ -663,38 +662,33 @@ static void qedr_populate_pbls(struct qedr_dev *dev, struct ib_umem *umem, pbe_cnt = 0; - shift = umem->page_shift; - fw_pg_per_umem_pg = BIT(umem->page_shift - pg_shift); - for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { - pages = sg_dma_len(sg) >> shift; - pg_addr = sg_dma_address(sg); - for (pg_cnt = 0; pg_cnt < pages; pg_cnt++) { - for (fw_pg_cnt = 0; fw_pg_cnt < fw_pg_per_umem_pg;) { - pbe->lo = cpu_to_le32(pg_addr); - pbe->hi = cpu_to_le32(upper_32_bits(pg_addr)); - - pg_addr += BIT(pg_shift); - pbe_cnt++; - total_num_pbes++; - pbe++; - - if (total_num_pbes == pbl_info->num_pbes) - return; - - /* If the given pbl is full storing the pbes, - * move to next pbl. - */ - if (pbe_cnt == - (pbl_info->pbl_size / sizeof(u64))) { - pbl_tbl++; - pbe = (struct regpair *)pbl_tbl->va; - pbe_cnt = 0; - } + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { + pg_addr = sg_page_iter_dma_address(&sg_iter); + for (fw_pg_cnt = 0; fw_pg_cnt < fw_pg_per_umem_pg;) { + pbe->lo = cpu_to_le32(pg_addr); + pbe->hi = cpu_to_le32(upper_32_bits(pg_addr)); + + pg_addr += BIT(pg_shift); + pbe_cnt++; + total_num_pbes++; + pbe++; - fw_pg_cnt++; + if (total_num_pbes == pbl_info->num_pbes) + return; + + /* If the given pbl is full storing the pbes, + * move to next pbl. + */ + if (pbe_cnt == + (pbl_info->pbl_size / sizeof(u64))) { + pbl_tbl++; + pbe = (struct regpair *)pbl_tbl->va; + pbe_cnt = 0; } + + fw_pg_cnt++; } } } diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c index fb0c5c0..d40099b 100644 --- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c +++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c @@ -183,25 +183,20 @@ int pvrdma_page_dir_insert_umem(struct pvrdma_page_dir *pdir, struct ib_umem *umem, u64 offset) { u64 i = offset; - int j, entry; - int ret = 0, len = 0; - struct scatterlist *sg; + int ret = 0; + struct sg_page_iter sg_iter; if (offset >= pdir->npages) return -EINVAL; - for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { - len = sg_dma_len(sg) >> PAGE_SHIFT; - for (j = 0; j < len; j++) { - dma_addr_t addr = sg_dma_address(sg) + - (j << umem->page_shift); + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { + dma_addr_t addr = sg_page_iter_dma_address(&sg_iter); - ret = pvrdma_page_dir_insert_dma(pdir, i, addr); - if (ret) - goto exit; + ret = pvrdma_page_dir_insert_dma(pdir, i, addr); + if (ret) + goto exit; - i++; - } + i++; } exit: diff --git a/drivers/infiniband/sw/rdmavt/mr.c b/drivers/infiniband/sw/rdmavt/mr.c index 49c9541..8f02059 100644 --- a/drivers/infiniband/sw/rdmavt/mr.c +++ b/drivers/infiniband/sw/rdmavt/mr.c @@ -381,8 +381,8 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, { struct rvt_mr *mr; struct ib_umem *umem; - struct scatterlist *sg; - int n, m, entry; + struct sg_page_iter sg_iter; + int n, m; struct ib_mr *ret; if (length == 0) @@ -411,10 +411,10 @@ struct ib_mr *rvt_reg_user_mr(struct ib_pd *pd, u64 start, u64 length, mr->mr.page_shift = umem->page_shift; m = 0; n = 0; - for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { void *vaddr; - vaddr = page_address(sg_page(sg)); + vaddr = page_address(sg_page_iter_page(&sg_iter)); if (!vaddr) { ret = ERR_PTR(-EINVAL); goto bail_inval; diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c index 9d3916b..a6937de 100644 --- a/drivers/infiniband/sw/rxe/rxe_mr.c +++ b/drivers/infiniband/sw/rxe/rxe_mr.c @@ -162,11 +162,10 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start, u64 length, u64 iova, int access, struct ib_udata *udata, struct rxe_mem *mem) { - int entry; struct rxe_map **map; struct rxe_phys_buf *buf = NULL; struct ib_umem *umem; - struct scatterlist *sg; + struct sg_page_iter sg_iter; int num_buf; void *vaddr; int err; @@ -199,8 +198,8 @@ int rxe_mem_init_user(struct rxe_pd *pd, u64 start, if (length > 0) { buf = map[0]->buf; - for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { - vaddr = page_address(sg_page(sg)); + for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { + vaddr = page_address(sg_page_iter_page(&sg_iter)); if (!vaddr) { pr_warn("null vaddr\n"); err = -ENOMEM; From patchwork Mon Dec 24 22:32:23 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10742481 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8C4676C5 for ; Mon, 24 Dec 2018 22:32:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7D85D28A00 for ; Mon, 24 Dec 2018 22:32:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7218728A3D; Mon, 24 Dec 2018 22:32:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8EB7B28A00 for ; Mon, 24 Dec 2018 22:32:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725822AbeLXWcm (ORCPT ); Mon, 24 Dec 2018 17:32:42 -0500 Received: from mga04.intel.com ([192.55.52.120]:63659 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725788AbeLXWcm (ORCPT ); Mon, 24 Dec 2018 17:32:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Dec 2018 14:32:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,394,1539673200"; d="scan'208";a="286217048" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.230.58]) by orsmga005.jf.intel.com with ESMTP; 24 Dec 2018 14:32:40 -0800 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH rdma-next 2/6] RDMA/umem: Add API to find best driver supported page size in an MR Date: Mon, 24 Dec 2018 16:32:23 -0600 Message-Id: <20181224223227.18016-3-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181224223227.18016-1-shiraz.saleem@intel.com> References: <20181224223227.18016-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This helper iterates through the SG list to find the best page size to use from a bitmap of HW supported page sizes. Drivers that support multiple page sizes, but not mixed pages in an MR can use this API. Suggested-by: Jason Gunthorpe Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 86 ++++++++++++++++++++++++++++++++++++++++++ include/rdma/ib_umem.h | 9 +++++ 2 files changed, 95 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 64bacc5..b2f2d75 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -119,6 +119,92 @@ static void ib_umem_add_sg_table(struct scatterlist **cur, } /** + * ib_umem_find_pg_bit - Find the page bit to use for phyaddr + * + * @phyaddr: Physical address after DMA translation + * @supported_pgsz: bitmask of HW supported page sizes + */ +static unsigned int ib_umem_find_pg_bit(unsigned long phyaddr, + unsigned long supported_pgsz) +{ + unsigned long num_zeroes; + unsigned long pgsz; + + /* Trailing zero bits in the address */ + num_zeroes = __ffs(phyaddr); + + /* Find page bit such that phyaddr is aligned to the highest supported + * HW page size + */ + pgsz = supported_pgsz & (BIT_ULL(num_zeroes + 1) - 1); + if (!pgsz) + return __ffs(supported_pgsz); + + return (fls64(pgsz) - 1); +} + +/** + * ib_umem_find_single_pg_size - Find best HW page size to use for this MR + * @umem: umem struct + * @supported_pgsz: bitmask of HW supported page sizes + * @uvirt_addr: user-space virtual MR base address + * + * This helper is intended for HW that support multiple page + * sizes but can do only a single page size in an MR. + */ +unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz, + unsigned long uvirt_addr) +{ + struct scatterlist *sg; + unsigned int pg_bit_sg, min_pg_bit, best_pg_bit; + int i; + + if (!supported_pgsz) + return 0; + + min_pg_bit = __ffs(supported_pgsz); + best_pg_bit = fls64(supported_pgsz) - 1; + + for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) { + unsigned long dma_addr_start, dma_addr_end; + + dma_addr_start = sg_dma_address(sg); + dma_addr_end = sg_dma_address(sg) + sg_dma_len(sg); + if (!i) { + pg_bit_sg = ib_umem_find_pg_bit(dma_addr_end, supported_pgsz); + + /* The start offset of the MR into a first _large_ page + * should line up exactly for the user-space virtual buf + * and physical buffer, in order to upgrade the page bit + */ + if (pg_bit_sg > PAGE_SHIFT) { + unsigned int uvirt_pg_bit; + + uvirt_pg_bit = ib_umem_find_pg_bit(uvirt_addr + sg_dma_len(sg), + supported_pgsz); + pg_bit_sg = min_t(unsigned int, uvirt_pg_bit, pg_bit_sg); + } + } else if (i == (umem->nmap - 1)) { + /* last SGE: Does not matter if MR ends at an + * unaligned offset. + */ + pg_bit_sg = ib_umem_find_pg_bit(dma_addr_start, supported_pgsz); + } else { + pg_bit_sg = ib_umem_find_pg_bit(dma_addr_start, + supported_pgsz & (BIT_ULL(__ffs(sg_dma_len(sg)) + 1) - 1)); + } + + best_pg_bit = min_t(unsigned int, best_pg_bit, pg_bit_sg); + if (best_pg_bit == min_pg_bit) + break; + } + + return BIT_ULL(best_pg_bit); +} +EXPORT_SYMBOL(ib_umem_find_single_pg_size); + +/** * ib_umem_get - Pin and DMA map userspace memory. * * If access flags indicate ODP memory, avoid pinning. Instead, stores diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 5d3755e..3e8e1ed 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -86,6 +86,9 @@ void ib_umem_release(struct ib_umem *umem); int ib_umem_page_count(struct ib_umem *umem); int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset, size_t length); +unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz, + unsigned long uvirt_addr); #else /* CONFIG_INFINIBAND_USER_MEM */ @@ -102,6 +105,12 @@ static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offs size_t length) { return -EINVAL; } +static inline int ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz, + unsigned long uvirt_addr) { + return -EINVAL; +} + #endif /* CONFIG_INFINIBAND_USER_MEM */ #endif /* IB_UMEM_H */ From patchwork Mon Dec 24 22:32:24 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10742485 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 715A06C5 for ; Mon, 24 Dec 2018 22:32:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 63E02289FB for ; Mon, 24 Dec 2018 22:32:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 5847028A35; Mon, 24 Dec 2018 22:32:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E3090289FB for ; Mon, 24 Dec 2018 22:32:42 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725823AbeLXWcm (ORCPT ); Mon, 24 Dec 2018 17:32:42 -0500 Received: from mga04.intel.com ([192.55.52.120]:63659 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725820AbeLXWcm (ORCPT ); Mon, 24 Dec 2018 17:32:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Dec 2018 14:32:41 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,394,1539673200"; d="scan'208";a="286217051" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.230.58]) by orsmga005.jf.intel.com with ESMTP; 24 Dec 2018 14:32:41 -0800 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH rdma-next 3/6] RDMA/umem: Add API to return optimal HW DMA addresses from SG list Date: Mon, 24 Dec 2018 16:32:24 -0600 Message-Id: <20181224223227.18016-4-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181224223227.18016-1-shiraz.saleem@intel.com> References: <20181224223227.18016-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This helper iterates the SG list and returns suitable HW aligned DMA addresses within a driver supported page size. The implementation is intended to work for HW that support single page sizes or mixed page sizes in an MR. This avoids the need for having driver specific algorithms to achieve the same thing and redundant walks of the SG list. Suggested-by: Jason Gunthorpe Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 78 ++++++++++++++++++++++++++++++++++++++++++ include/rdma/ib_umem.h | 23 +++++++++++++ 2 files changed, 101 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index b2f2d75..8f8a1f6 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -204,6 +204,84 @@ unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, } EXPORT_SYMBOL(ib_umem_find_single_pg_size); +static unsigned int ib_umem_find_mixed_pg_bit(struct scatterlist *sgl_head, + struct sg_phys_iter *sg_phys_iter) +{ + unsigned long dma_addr_start, dma_addr_end; + + dma_addr_start = sg_dma_address(sg_phys_iter->sg); + dma_addr_end = sg_dma_address(sg_phys_iter->sg) + + sg_dma_len(sg_phys_iter->sg); + + if (sg_phys_iter->sg == sgl_head) + return ib_umem_find_pg_bit(dma_addr_end, + sg_phys_iter->supported_pgsz); + else if (sg_is_last(sg_phys_iter->sg)) + return ib_umem_find_pg_bit(dma_addr_start, + sg_phys_iter->supported_pgsz); + else + return ib_umem_find_pg_bit(sg_phys_iter->phyaddr, + sg_phys_iter->supported_pgsz); +} +void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz) +{ + memset(sg_phys_iter, 0, sizeof(struct sg_phys_iter)); + sg_phys_iter->sg = umem->sg_head.sgl; + sg_phys_iter->supported_pgsz = supported_pgsz; + + /* Single page support in MR */ + if (hweight_long(supported_pgsz) == 1) + sg_phys_iter->pg_bit = fls64(supported_pgsz) - 1; + else + sg_phys_iter->mixed_pg_support = true; +} +EXPORT_SYMBOL(ib_umem_start_phys_iter); + +/** + * ib_umem_next_phys_iter - SG list iterator that returns aligned HW address + * @umem: umem struct + * @sg_phys_iter: SG HW address iterator + * @supported_pgsz: bitmask of HW supported page sizes + * + * This helper iterates over the SG list and returns the HW + * address aligned to a supported HW page size. + */ +bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter) +{ + unsigned long pg_mask; + + if (!sg_phys_iter->sg || !sg_phys_iter->supported_pgsz) + return false; + + if (sg_phys_iter->remaining) { + sg_phys_iter->phyaddr += (sg_phys_iter->len + sg_phys_iter->offset); + } else { + sg_phys_iter->phyaddr = sg_dma_address(sg_phys_iter->sg); + sg_phys_iter->remaining = sg_dma_len(sg_phys_iter->sg); + } + + /* Mixed page support in MR*/ + if (sg_phys_iter->mixed_pg_support) + sg_phys_iter->pg_bit = ib_umem_find_mixed_pg_bit(umem->sg_head.sgl, + sg_phys_iter); + + pg_mask = ~(BIT_ULL(sg_phys_iter->pg_bit) - 1); + + sg_phys_iter->offset = sg_phys_iter->phyaddr & ~pg_mask; + sg_phys_iter->phyaddr = sg_phys_iter->phyaddr & pg_mask; + sg_phys_iter->len = min_t(unsigned long, sg_phys_iter->remaining, + BIT_ULL(sg_phys_iter->pg_bit) - sg_phys_iter->offset); + sg_phys_iter->remaining -= sg_phys_iter->len; + if (!sg_phys_iter->remaining) + sg_phys_iter->sg = sg_next(sg_phys_iter->sg); + + return true; +} +EXPORT_SYMBOL(ib_umem_next_phys_iter); + /** * ib_umem_get - Pin and DMA map userspace memory. * diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 3e8e1ed..57cc621 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -55,6 +55,17 @@ struct ib_umem { int npages; }; +struct sg_phys_iter { + struct scatterlist *sg; + unsigned long phyaddr; + unsigned long len; + unsigned long offset; + unsigned long supported_pgsz; + unsigned long remaining; + unsigned int pg_bit; + u8 mixed_pg_support; +}; + /* Returns the offset of the umem start relative to the first page. */ static inline int ib_umem_offset(struct ib_umem *umem) { @@ -89,6 +100,11 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset, unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, unsigned long supported_pgsz, unsigned long uvirt_addr); +void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz); +bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter); #else /* CONFIG_INFINIBAND_USER_MEM */ @@ -110,6 +126,13 @@ static inline int ib_umem_find_single_pg_size(struct ib_umem *umem, unsigned long uvirt_addr) { return -EINVAL; } +static inline void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz) { } +static inline bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter) { + return false; +} #endif /* CONFIG_INFINIBAND_USER_MEM */ From patchwork Mon Dec 24 22:32:25 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10742487 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E90B113BF for ; Mon, 24 Dec 2018 22:32:45 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA82C28A29 for ; Mon, 24 Dec 2018 22:32:45 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id CEEA628A35; Mon, 24 Dec 2018 22:32:45 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 378F528A29 for ; Mon, 24 Dec 2018 22:32:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725826AbeLXWcn (ORCPT ); Mon, 24 Dec 2018 17:32:43 -0500 Received: from mga04.intel.com ([192.55.52.120]:63659 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725820AbeLXWcm (ORCPT ); Mon, 24 Dec 2018 17:32:42 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Dec 2018 14:32:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,394,1539673200"; d="scan'208";a="286217055" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.230.58]) by orsmga005.jf.intel.com with ESMTP; 24 Dec 2018 14:32:42 -0800 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH rdma-next 4/6] RDMA/i40iw: Use umem API to retrieve optimal HW address Date: Mon, 24 Dec 2018 16:32:25 -0600 Message-Id: <20181224223227.18016-5-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181224223227.18016-1-shiraz.saleem@intel.com> References: <20181224223227.18016-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Call the core helpers to retrieve the optimal HW aligned address to use for the MR, within a supported i40iw page size. Remove code in i40iw to determine when MR is backed by 2M huge pages which involves checking the umem->hugetlb flag and VMA inspection. The core helpers will return the 2M aligned address if the MR is backed by 2M pages. Fixes: f26c7c83395b ("i40iw: Add 2MB page support") Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/hw/i40iw/i40iw_user.h | 5 ++++ drivers/infiniband/hw/i40iw/i40iw_verbs.c | 49 ++++++------------------------- drivers/infiniband/hw/i40iw/i40iw_verbs.h | 3 +- 3 files changed, 15 insertions(+), 42 deletions(-) diff --git a/drivers/infiniband/hw/i40iw/i40iw_user.h b/drivers/infiniband/hw/i40iw/i40iw_user.h index b125925..09fdcee 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_user.h +++ b/drivers/infiniband/hw/i40iw/i40iw_user.h @@ -80,6 +80,11 @@ enum i40iw_device_capabilities_const { I40IW_MAX_PDS = 32768 }; +enum i40iw_supported_page_size { + I40IW_PAGE_SZ_4K = 0x00001000, + I40IW_PAGE_SZ_2M = 0x00200000 +}; + #define i40iw_handle void * #define i40iw_adapter_handle i40iw_handle #define i40iw_qp_handle i40iw_handle diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.c b/drivers/infiniband/hw/i40iw/i40iw_verbs.c index 3e7a299..2471fed 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.c +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.c @@ -1371,53 +1371,22 @@ static void i40iw_copy_user_pgaddrs(struct i40iw_mr *iwmr, struct i40iw_pbl *iwpbl = &iwmr->iwpbl; struct i40iw_pble_alloc *palloc = &iwpbl->pble_alloc; struct i40iw_pble_info *pinfo; - struct sg_page_iter sg_iter; - u64 pg_addr = 0; + struct sg_phys_iter sg_phys_iter; u32 idx = 0; - bool first_pg = true; pinfo = (level == I40IW_LEVEL_1) ? NULL : palloc->level2.leaf; if (iwmr->type == IW_MEMREG_TYPE_QP) iwpbl->qp_mr.sq_page = sg_page(region->sg_head.sgl); - for_each_sg_page(region->sg_head.sgl, &sg_iter, region->nmap, 0) { - pg_addr = sg_page_iter_dma_address(&sg_iter); - if (first_pg) - *pbl = cpu_to_le64(pg_addr & iwmr->page_msk); - else if (!(pg_addr & ~iwmr->page_msk)) - *pbl = cpu_to_le64(pg_addr); - else - continue; - - first_pg = false; + for (ib_umem_start_phys_iter(region, &sg_phys_iter, iwmr->page_size); + ib_umem_next_phys_iter(region, &sg_phys_iter);) { + *pbl = cpu_to_le64(sg_phys_iter.phyaddr); pbl = i40iw_next_pbl_addr(pbl, &pinfo, &idx); } } /** - * i40iw_set_hugetlb_params - set MR pg size and mask to huge pg values. - * @addr: virtual address - * @iwmr: mr pointer for this memory registration - */ -static void i40iw_set_hugetlb_values(u64 addr, struct i40iw_mr *iwmr) -{ - struct vm_area_struct *vma; - struct hstate *h; - - down_read(¤t->mm->mmap_sem); - vma = find_vma(current->mm, addr); - if (vma && is_vm_hugetlb_page(vma)) { - h = hstate_vma(vma); - if (huge_page_size(h) == 0x200000) { - iwmr->page_size = huge_page_size(h); - iwmr->page_msk = huge_page_mask(h); - } - } - up_read(¤t->mm->mmap_sem); -} - -/** * i40iw_check_mem_contiguous - check if pbls stored in arr are contiguous * @arr: lvl1 pbl array * @npages: page count @@ -1871,11 +1840,11 @@ static struct ib_mr *i40iw_reg_user_mr(struct ib_pd *pd, iwmr->ibmr.device = pd->device; ucontext = to_ucontext(pd->uobject->context); - iwmr->page_size = PAGE_SIZE; - iwmr->page_msk = PAGE_MASK; - - if (region->hugetlb && (req.reg_type == IW_MEMREG_TYPE_MEM)) - i40iw_set_hugetlb_values(start, iwmr); + iwmr->page_size = I40IW_PAGE_SZ_4K; + if (req.reg_type == IW_MEMREG_TYPE_MEM) + iwmr->page_size = ib_umem_find_single_pg_size(region, + I40IW_PAGE_SZ_4K | I40IW_PAGE_SZ_2M, + virt); region_length = region->length + (start & (iwmr->page_size - 1)); pg_shift = ffs(iwmr->page_size) - 1; diff --git a/drivers/infiniband/hw/i40iw/i40iw_verbs.h b/drivers/infiniband/hw/i40iw/i40iw_verbs.h index 76cf173..3a41375 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_verbs.h +++ b/drivers/infiniband/hw/i40iw/i40iw_verbs.h @@ -94,8 +94,7 @@ struct i40iw_mr { struct ib_umem *region; u16 type; u32 page_cnt; - u32 page_size; - u64 page_msk; + u64 page_size; u32 npages; u32 stag; u64 length; From patchwork Mon Dec 24 22:32:26 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10742483 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E53FC13BF for ; Mon, 24 Dec 2018 22:32:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D4F8A28A00 for ; Mon, 24 Dec 2018 22:32:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C427E28A3D; Mon, 24 Dec 2018 22:32:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id DA58E28A2A for ; Mon, 24 Dec 2018 22:32:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725830AbeLXWcn (ORCPT ); Mon, 24 Dec 2018 17:32:43 -0500 Received: from mga04.intel.com ([192.55.52.120]:63659 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725820AbeLXWcn (ORCPT ); Mon, 24 Dec 2018 17:32:43 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Dec 2018 14:32:42 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,394,1539673200"; d="scan'208";a="286217060" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.230.58]) by orsmga005.jf.intel.com with ESMTP; 24 Dec 2018 14:32:42 -0800 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH rdma-next 5/6] RDMA/bnxt_re: Use umem APIs to retrieve optimal HW address Date: Mon, 24 Dec 2018 16:32:26 -0600 Message-Id: <20181224223227.18016-6-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181224223227.18016-1-shiraz.saleem@intel.com> References: <20181224223227.18016-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Call the core helpers to retrieve the optimal HW aligned address to use for the MR, within a supported bnxt_re page size. Remove checking the umem->hugtetlb flag as it is no longer required. The core helpers will return the 2M aligned address if the MR is backed by 2M huge pages. Signed-off-by: Shiraz Saleem --- drivers/infiniband/hw/bnxt_re/ib_verbs.c | 28 +++++++++++----------------- 1 file changed, 11 insertions(+), 17 deletions(-) diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c index fa0cb7c..6692435 100644 --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c @@ -3535,17 +3535,13 @@ static int fill_umem_pbl_tbl(struct ib_umem *umem, u64 *pbl_tbl_orig, int page_shift) { u64 *pbl_tbl = pbl_tbl_orig; - u64 paddr; - u64 page_mask = (1ULL << page_shift) - 1; - struct sg_page_iter sg_iter; + u64 page_size = BIT_ULL(page_shift); + struct sg_phys_iter sg_phys_iter; + + for (ib_umem_start_phys_iter(umem, &sg_phys_iter, page_size); + ib_umem_next_phys_iter(umem, &sg_phys_iter);) + *pbl_tbl++ = sg_phys_iter.phyaddr; - for_each_sg_page(umem->sg_head.sgl, &sg_iter, umem->nmap, 0) { - paddr = sg_page_iter_dma_address(&sg_iter); - if (pbl_tbl == pbl_tbl_orig) - *pbl_tbl++ = paddr & ~page_mask; - else if ((paddr & page_mask) == 0) - *pbl_tbl++ = paddr; - } return pbl_tbl - pbl_tbl_orig; } @@ -3608,7 +3604,9 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length, goto free_umem; } - page_shift = umem->page_shift; + page_shift = __ffs(ib_umem_find_single_pg_size(umem, + BNXT_RE_PAGE_SIZE_4K | BNXT_RE_PAGE_SIZE_2M, + virt_addr)); if (!bnxt_re_page_size_ok(page_shift)) { dev_err(rdev_to_dev(rdev), "umem page size unsupported!"); @@ -3616,17 +3614,13 @@ struct ib_mr *bnxt_re_reg_user_mr(struct ib_pd *ib_pd, u64 start, u64 length, goto fail; } - if (!umem->hugetlb && length > BNXT_RE_MAX_MR_SIZE_LOW) { + if (page_shift == BNXT_RE_PAGE_SHIFT_4K && + length > BNXT_RE_MAX_MR_SIZE_LOW) { dev_err(rdev_to_dev(rdev), "Requested MR Sz:%llu Max sup:%llu", length, (u64)BNXT_RE_MAX_MR_SIZE_LOW); rc = -EINVAL; goto fail; } - if (umem->hugetlb && length > BNXT_RE_PAGE_SIZE_2M) { - page_shift = BNXT_RE_PAGE_SHIFT_2M; - dev_warn(rdev_to_dev(rdev), "umem hugetlb set page_size %x", - 1 << page_shift); - } /* Map umem buf ptrs to the PBL */ umem_pgs = fill_umem_pbl_tbl(umem, pbl_tbl, page_shift); From patchwork Mon Dec 24 22:32:27 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10742489 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BA2526C5 for ; Mon, 24 Dec 2018 22:32:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ABABF289FB for ; Mon, 24 Dec 2018 22:32:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A03B828A2A; Mon, 24 Dec 2018 22:32:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 78973289FB for ; Mon, 24 Dec 2018 22:32:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725834AbeLXWcp (ORCPT ); Mon, 24 Dec 2018 17:32:45 -0500 Received: from mga04.intel.com ([192.55.52.120]:63659 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725820AbeLXWco (ORCPT ); Mon, 24 Dec 2018 17:32:44 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Dec 2018 14:32:43 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,394,1539673200"; d="scan'208";a="286217063" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.230.58]) by orsmga005.jf.intel.com with ESMTP; 24 Dec 2018 14:32:43 -0800 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH rdma-next 6/6] RDMA/umem: Remove hugetlb flag Date: Mon, 24 Dec 2018 16:32:27 -0600 Message-Id: <20181224223227.18016-7-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181224223227.18016-1-shiraz.saleem@intel.com> References: <20181224223227.18016-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP The drivers i40iw and bnxt_re no longer dependent on the hugetlb flag. So remove this flag from ib_umem structure. Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 15 --------------- drivers/infiniband/core/umem_odp.c | 3 --- include/rdma/ib_umem.h | 1 - 3 files changed, 19 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 8f8a1f6..19534ef 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -306,7 +306,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, struct mm_struct *mm; unsigned long npages; int ret; - int i; unsigned long dma_attrs = 0; struct scatterlist *sg_cur, *sg_nxt; unsigned int gup_flags = FOLL_WRITE; @@ -351,9 +350,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, return umem; } - /* We assume the memory is from hugetlb until proved otherwise */ - umem->hugetlb = 1; - page_list = (struct page **) __get_free_page(GFP_KERNEL); if (!page_list) { ret = -ENOMEM; @@ -365,8 +361,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, * just assume the memory is not hugetlb memory */ vma_list = (struct vm_area_struct **) __get_free_page(GFP_KERNEL); - if (!vma_list) - umem->hugetlb = 0; npages = ib_umem_num_pages(umem); if (npages == 0 || npages > UINT_MAX) { @@ -414,15 +408,6 @@ struct ib_umem *ib_umem_get(struct ib_ucontext *context, unsigned long addr, npages -= ret; ib_umem_add_sg_table(&sg_cur, &sg_nxt, page_list, ret); - - /* Continue to hold the mmap_sem as vma_list access - * needs to be protected. - */ - for (i = 0; i < ret && umem->hugetlb; i++) { - if (vma_list && !is_vm_hugetlb_page(vma_list[i])) - umem->hugetlb = 0; - } - up_read(&mm->mmap_sem); } diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 9608681..33c188e 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -416,9 +416,6 @@ int ib_umem_odp_get(struct ib_umem_odp *umem_odp, int access) h = hstate_vma(vma); umem->page_shift = huge_page_shift(h); up_read(&mm->mmap_sem); - umem->hugetlb = 1; - } else { - umem->hugetlb = 0; } mutex_init(&umem_odp->umem_mutex); diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 57cc621..770d249 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -47,7 +47,6 @@ struct ib_umem { unsigned long address; int page_shift; u32 writable : 1; - u32 hugetlb : 1; u32 is_odp : 1; struct work_struct work; struct sg_table sg_head;