From patchwork Mon Dec 24 22:32:21 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Shiraz Saleem X-Patchwork-Id: 10742479 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8F7976C5 for ; Mon, 24 Dec 2018 22:32:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7770028A00 for ; Mon, 24 Dec 2018 22:32:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 6A7FE28A2A; Mon, 24 Dec 2018 22:32:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.9 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_HI autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EAB0728A00 for ; Mon, 24 Dec 2018 22:32:40 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725817AbeLXWck (ORCPT ); Mon, 24 Dec 2018 17:32:40 -0500 Received: from mga04.intel.com ([192.55.52.120]:63657 "EHLO mga04.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725788AbeLXWck (ORCPT ); Mon, 24 Dec 2018 17:32:40 -0500 X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Dec 2018 14:32:39 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,394,1539673200"; d="scan'208";a="286217039" Received: from ssaleem-mobl4.amr.corp.intel.com ([10.255.230.58]) by orsmga005.jf.intel.com with ESMTP; 24 Dec 2018 14:32:38 -0800 From: Shiraz Saleem To: dledford@redhat.com, jgg@ziepe.ca Cc: linux-rdma@vger.kernel.org, Shiraz Saleem Subject: [PATCH rdma-next 0/6] Introduce APIs to get DMA addresses aligned to a HW supported page size Date: Mon, 24 Dec 2018 16:32:21 -0600 Message-Id: <20181224223227.18016-1-shiraz.saleem@intel.com> X-Mailer: git-send-email 2.8.3 Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP This patch set is aiming to allow drivers that support multiple page sizes to leverage the core umem APIs to obtain suitable HW DMA addresses for the MR, aligned to a supported page size. The APIs accomodates for HW that support single page size or mixed page sizes in an MR. The motivation for this work comes from the discussion in [1]. The first patch modifies current memory registration API ib_umem_get() to combine contiguous regions into SGEs and add them to the scatter table. Driver call-sites are updated to use the for_each_sg_page iterator where applicable. The second patch introduces a new core API that allows drivers to find the best supported page size to use for this MR, from a bitmap of HW supported page sizes. The third patch introduces new core APIs that iterates through the SG list and returns suitable HW DMA addresses aligned to a driver supported page size. The fourth patch and fifth patch removes the dependency of i40iw and bnxt_re drivers on the hugetlb flag. The new core APIs are called in these drivers to get huge page size aligned addresses if the MR is backed by huge pages. The sixth patch removes the hugetlb flag from IB core. Please note that mixed page portion of the algorithm and bnxt_re update in patch #5 have not been tested on hardware. [1] https://patchwork.kernel.org/patch/10499753/ RFC-->v0: --------- * Add to scatter table by iterating a limited sized page list. * Updated driver call sites to use the for_each_sg_page iterator variant where applicable. * Tweaked algorithm in ib_umem_find_single_pg_size and ib_umem_next_phys_iter to ignore alignment of the start of first SGE and end of the last SGE. * Simplified ib_umem_find_single_pg_size on offset alignments checks for user-space virtual and physical buffer. * Updated ib_umem_start_phys_iter to do some pre-computation for the non-mixed page support case. * Updated bnxt_re driver to use the new core APIs and remove its dependency on the huge tlb flag. * Fixed a bug in computation of sg_phys_iter->phyaddr in ib_umem_next_phys_iter. * Drop hugetlb flag usage from RDMA subsystem. * Rebased on top of for-next. Shiraz Saleem (6): RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs RDMA/umem: Add API to find best driver supported page size in an MR RDMA/umem: Add API to return optimal HW DMA addresses from SG list RDMA/i40iw: Use umem API to retrieve optimal HW address RDMA/bnxt_re: Use umem APIs to retrieve optimal HW address RDMA/umem: Remove hugetlb flag drivers/infiniband/core/umem.c | 260 ++++++++++++++++++++++--- drivers/infiniband/core/umem_odp.c | 3 - drivers/infiniband/hw/bnxt_re/ib_verbs.c | 35 ++-- drivers/infiniband/hw/bnxt_re/qplib_res.c | 9 +- drivers/infiniband/hw/cxgb3/iwch_provider.c | 27 ++- drivers/infiniband/hw/cxgb4/mem.c | 31 ++- drivers/infiniband/hw/hns/hns_roce_hw_v1.c | 7 +- drivers/infiniband/hw/hns/hns_roce_hw_v2.c | 25 +-- drivers/infiniband/hw/hns/hns_roce_mr.c | 88 ++++----- drivers/infiniband/hw/i40iw/i40iw_user.h | 5 + drivers/infiniband/hw/i40iw/i40iw_verbs.c | 58 ++---- drivers/infiniband/hw/i40iw/i40iw_verbs.h | 3 +- drivers/infiniband/hw/mthca/mthca_provider.c | 33 ++-- drivers/infiniband/hw/nes/nes_verbs.c | 203 +++++++++---------- drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 54 +++-- drivers/infiniband/hw/qedr/verbs.c | 56 +++--- drivers/infiniband/hw/vmw_pvrdma/pvrdma_misc.c | 21 +- drivers/infiniband/sw/rdmavt/mr.c | 8 +- drivers/infiniband/sw/rxe/rxe_mr.c | 7 +- include/rdma/ib_umem.h | 33 +++- 20 files changed, 541 insertions(+), 425 deletions(-)