From patchwork Fri Oct 19 23:34:07 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10650227 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7292290 for ; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6234028531 for ; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 56DF028538; Fri, 19 Oct 2018 23:34:44 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6D61C2853E for ; Fri, 19 Oct 2018 23:34:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727044AbeJTHmp (ORCPT ); Sat, 20 Oct 2018 03:42:45 -0400 Received: from mga17.intel.com ([192.55.52.151]:43450 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726881AbeJTHmp (ORCPT ); Sat, 20 Oct 2018 03:42:45 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Oct 2018 16:34:31 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,401,1534834800"; d="scan'208";a="82914373" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.33.88]) by orsmga008.jf.intel.com with ESMTP; 19 Oct 2018 16:34:30 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH RFC 2/4] RDMA/umem: Add API to find best driver supported page size in an MR Date: Fri, 19 Oct 2018 18:34:07 -0500 Message-Id: <20181019233409.1104-3-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181019233409.1104-1-shiraz.saleem@intel.com> References: <20181019233409.1104-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This helper iterates through the SG list to find the best page size to use from a bitmap of HW supported page sizes. Drivers that support multiple page sizes, but not mixed pages in an MR can call this API. Suggested-by: Jason Gunthorpe Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 95 ++++++++++++++++++++++++++++++++++++++++++ include/rdma/ib_umem.h | 7 ++++ 2 files changed, 102 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 486d6d7..04071b5 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -66,6 +66,101 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d } /** + * ib_umem_find_pg_bit - Find the page bit to use for phyaddr + * + * @phyaddr: Physical address after DMA translation + * @supported_pgsz: bitmask of HW supported page sizes + */ +static int ib_umem_find_pg_bit(unsigned long phyaddr, + unsigned long supported_pgsz) +{ + unsigned long num_zeroes; + int pg_bit; + + /* Trailing zero bits in the address */ + num_zeroes = __ffs(phyaddr); + + /* Find page bit such that phyaddr is aligned to the highest supported + * HW page size + */ + pg_bit = fls64(supported_pgsz & (BIT_ULL(num_zeroes + 1) - 1)) - 1; + + return pg_bit; +} + +/** + * ib_umem_find_single_pg_size - Find best HW page size to use for this MR + * @umem: umem struct + * @supported_pgsz: bitmask of HW supported page sizes + * + * This helper is intended for HW that support multiple page + * sizes but can do only a single page size in an MR. + */ +unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz) +{ + struct scatterlist *sg; + unsigned long dma_addr_start, dma_addr_end; + unsigned long uvirt_offset, phy_offset; + unsigned long pg_mask, bitmap; + int pg_bit_start, pg_bit_end, pg_bit_sg_chunk; + int lowest_pg_bit, best_pg_bit; + int i; + + if (!supported_pgsz) + return 0; + + lowest_pg_bit = __ffs(supported_pgsz); + best_pg_bit = fls64(supported_pgsz) - 1; + + for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.orig_nents, i) { + dma_addr_start = sg_dma_address(sg); + dma_addr_end = sg_dma_address(sg) + sg_dma_len(sg); + pg_bit_start = ib_umem_find_pg_bit(dma_addr_start, supported_pgsz); + pg_bit_end = ib_umem_find_pg_bit(dma_addr_end, supported_pgsz); + + if (!i) { + pg_bit_sg_chunk = max_t(int, pg_bit_start, pg_bit_end); + bitmap = supported_pgsz; + /* The start offset of the MR into a first _large_ page + * should line up exactly for the user-space virtual buf + * and physical buffer, in order to upgrade the page bit + */ + while (pg_bit_sg_chunk > PAGE_SHIFT) { + pg_mask = ~((1 << pg_bit_sg_chunk) - 1); + uvirt_offset = umem->address & ~pg_mask; + phy_offset = (dma_addr_start + ib_umem_offset(umem)) & + ~pg_mask; + if (uvirt_offset == phy_offset) + break; + + /* Retry with next supported page size */ + clear_bit(pg_bit_sg_chunk, &bitmap); + pg_bit_sg_chunk = fls64(bitmap) - 1; + } + } else if (i == (umem->sg_head.orig_nents - 1)) { + /* last SG chunk: Does not matter if MR ends at an + * unaligned offset. + */ + pg_bit_sg_chunk = pg_bit_start; + } else { + pg_bit_sg_chunk = min_t(int, pg_bit_start, pg_bit_end); + } + + best_pg_bit = min_t(int, best_pg_bit, pg_bit_sg_chunk); + if (best_pg_bit == lowest_pg_bit) + break; + } + + /* best page bit cannot be less than the lowest supported HW size */ + if (best_pg_bit < lowest_pg_bit) + return BIT_ULL(lowest_pg_bit); + + return BIT_ULL(best_pg_bit); +} +EXPORT_SYMBOL(ib_umem_find_single_pg_size); + +/** * ib_umem_get - Pin and DMA map userspace memory. * * If access flags indicate ODP memory, avoid pinning. Instead, stores diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 5d3755e..24ba6c6 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -86,6 +86,8 @@ void ib_umem_release(struct ib_umem *umem); int ib_umem_page_count(struct ib_umem *umem); int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset, size_t length); +unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz); #else /* CONFIG_INFINIBAND_USER_MEM */ @@ -102,6 +104,11 @@ static inline int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offs size_t length) { return -EINVAL; } +static inline int ib_umem_find_single_pg_size(struct ib_umem *umem, + unsigned long supported_pgsz) { + return -EINVAL; +} + #endif /* CONFIG_INFINIBAND_USER_MEM */ #endif /* IB_UMEM_H */