From patchwork Fri Oct 19 23:34:08 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10650223 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D730A112B for ; Fri, 19 Oct 2018 23:34:43 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C73EA28531 for ; Fri, 19 Oct 2018 23:34:43 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id BB8BB28565; Fri, 19 Oct 2018 23:34:43 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 2C2652853A for ; Fri, 19 Oct 2018 23:34:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727061AbeJTHmn (ORCPT ); Sat, 20 Oct 2018 03:42:43 -0400 Received: from mga17.intel.com ([192.55.52.151]:43452 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727050AbeJTHmn (ORCPT ); Sat, 20 Oct 2018 03:42:43 -0400 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 19 Oct 2018 16:34:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,401,1534834800"; d="scan'208";a="82914382" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.33.88]) by orsmga008.jf.intel.com with ESMTP; 19 Oct 2018 16:34:31 -0700 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH RFC 3/4] RDMA/umem: Add API to return optimal HW DMA addresses from SG list Date: Fri, 19 Oct 2018 18:34:08 -0500 Message-Id: <20181019233409.1104-4-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 In-Reply-To: <20181019233409.1104-1-shiraz.saleem@intel.com> References: <20181019233409.1104-1-shiraz.saleem@intel.com> Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This helper iterates the SG list and returns suitable HW aligned DMA addresses within a driver supported page size. The implementation is intended to work for HW that support single page sizes or mixed page sizes in an MR. This avoids the need for having driver specific algorithms to achieve the same thing and redundant walks of the SG list. Suggested-by: Jason Gunthorpe Reviewed-by: Michael J. Ruhl Signed-off-by: Shiraz Saleem --- drivers/infiniband/core/umem.c | 68 ++++++++++++++++++++++++++++++++++++++++++ include/rdma/ib_umem.h | 19 ++++++++++++ 2 files changed, 87 insertions(+) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 04071b5..cba79ab 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -160,6 +160,74 @@ unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, } EXPORT_SYMBOL(ib_umem_find_single_pg_size); +void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter) +{ + memset(sg_phys_iter, 0, sizeof(struct sg_phys_iter)); + sg_phys_iter->sg = umem->sg_head.sgl; +} +EXPORT_SYMBOL(ib_umem_start_phys_iter); + +/** + * ib_umem_next_phys_iter - SG list iterator that returns aligned HW address + * @umem: umem struct + * @sg_phys_iter: SG HW address iterator + * @supported_pgsz: bitmask of HW supported page sizes + * + * This helper iterates over the SG list and returns the HW + * address aligned to a supported HW page size. + * + * The algorithm differs slightly between HW that supports single + * page sizes vs mixed page sizes in an MR. For example, if an + * MR of size 4M-4K, starts at an offset PAGE_SIZE (ex: 4K) into + * a 2M page; HW that supports multiple page sizes (ex: 4K, 2M) + * would get 511 4K pages and one 2M page. Single page support + * HW would get back two 2M pages or 1023 4K pages. + */ +bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz) +{ + unsigned long pg_mask, offset; + int pg_bit; + + if (!sg_phys_iter->sg || !supported_pgsz) + return false; + + if (sg_phys_iter->remaining) { + sg_phys_iter->phyaddr += sg_phys_iter->len; + } else { + sg_phys_iter->phyaddr = sg_dma_address(sg_phys_iter->sg); + sg_phys_iter->remaining = sg_dma_len(sg_phys_iter->sg); + } + + /* Single page support in MR */ + if (hweight_long(supported_pgsz) == 1) { + pg_bit = fls64(supported_pgsz) - 1; + } else { + /* Mixed page support in MR*/ + pg_bit = ib_umem_find_pg_bit(sg_phys_iter->phyaddr, + supported_pgsz); + } + + /* page bit cannot be less than the lowest supported HW size */ + if (WARN_ON(pg_bit < __ffs(supported_pgsz))) + return false; + + pg_mask = ~((1 << pg_bit) - 1); + + offset = sg_phys_iter->phyaddr & ~pg_mask; + sg_phys_iter->phyaddr = sg_phys_iter->phyaddr & pg_mask; + sg_phys_iter->len = min_t(int, sg_phys_iter->remaining, + (1 << (pg_bit)) - offset); + sg_phys_iter->remaining -= sg_phys_iter->len; + if (!sg_phys_iter->remaining) + sg_phys_iter->sg = sg_next(sg_phys_iter->sg); + + return true; +} +EXPORT_SYMBOL(ib_umem_next_phys_iter); + /** * ib_umem_get - Pin and DMA map userspace memory. * diff --git a/include/rdma/ib_umem.h b/include/rdma/ib_umem.h index 24ba6c6..8114fd1 100644 --- a/include/rdma/ib_umem.h +++ b/include/rdma/ib_umem.h @@ -55,6 +55,13 @@ struct ib_umem { int npages; }; +struct sg_phys_iter { + struct scatterlist *sg; + unsigned long phyaddr; + size_t len; + unsigned int remaining; +}; + /* Returns the offset of the umem start relative to the first page. */ static inline int ib_umem_offset(struct ib_umem *umem) { @@ -88,6 +95,11 @@ int ib_umem_copy_from(void *dst, struct ib_umem *umem, size_t offset, size_t length); unsigned long ib_umem_find_single_pg_size(struct ib_umem *umem, unsigned long supported_pgsz); +void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter); +bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz); #else /* CONFIG_INFINIBAND_USER_MEM */ @@ -108,6 +120,13 @@ static inline int ib_umem_find_single_pg_size(struct ib_umem *umem, unsigned long supported_pgsz) { return -EINVAL; } +static inline void ib_umem_start_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter) { } +static inline bool ib_umem_next_phys_iter(struct ib_umem *umem, + struct sg_phys_iter *sg_phys_iter, + unsigned long supported_pgsz) { + return false; +} #endif /* CONFIG_INFINIBAND_USER_MEM */