From patchwork Tue Aug 16 08:11:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Kashyap Desai X-Patchwork-Id: 12944514 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01CADC2BB41 for ; Tue, 16 Aug 2022 09:39:35 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233793AbiHPJje (ORCPT ); Tue, 16 Aug 2022 05:39:34 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54326 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230122AbiHPJjZ (ORCPT ); Tue, 16 Aug 2022 05:39:25 -0400 Received: from mail-pj1-x1030.google.com (mail-pj1-x1030.google.com [IPv6:2607:f8b0:4864:20::1030]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A1C91107F23 for ; Tue, 16 Aug 2022 01:12:49 -0700 (PDT) Received: by mail-pj1-x1030.google.com with SMTP id h21-20020a17090aa89500b001f31a61b91dso16698546pjq.4 for ; Tue, 16 Aug 2022 01:12:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:message-id:date:subject:cc:to:from:from:to:cc; bh=xx1zkA2r1xi3+YPIzvJ2o9zvIaEFUSzDO45ltbS67yU=; b=FbI0HWxLtqyCR8tXtP474I678uFAIrpDPheP7FoANecs/YaHa8c34F2VG2pRGQnWbL O77u8yVVHggtebT5otis8z3KZMrdDsH+cIDgB0Ko+1gHgAAikJfqxs8CJ7RNXcixONue Oei1kxIx/1rphY6574p2DlQyF5cJy3BtnFcOY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc; bh=xx1zkA2r1xi3+YPIzvJ2o9zvIaEFUSzDO45ltbS67yU=; b=K4qg3jv9PLyGKi8EEO9tVRMiSfXJxXCY3ncXkVuRUzkq/2ktzow8mAZK2p7HoqipUl +msZ9jzXjmnzwWK7sByO5FwA/VlKy6Zu8lvhMHImg0aK/9I9qpcl6O4MFV728rR9YH71 yhF8rrXCb9Ssa51fnAya9gK8xs8xeQMYCbMJYnzVSM/F8MbMNpX9ARMhadT1epBBvMKO Pi0MEiDzMrjqyWdgtiQMlSTB0zsIHFmP54IWParC/ZgCtGCvwpnxJ2J4G3nzojgRJd01 MyZlVrLnurgp1PpE+8Xwo+kOrHBn+qbUNfPu9evw8fxk8dFgDB2NEUhKEn29hLhP9A42 jgXQ== X-Gm-Message-State: ACgBeo1/jO6iOgmb+Ew/TdHN4Ro9xrxGg//RlcT/X7CYtzqrL3dGoENR 8+cdMorAqiL43ZsxfyddepOLfva3Lj4G4qZrTg+KiNt9H2elygaltbvPuA0buFd7Zjoibjchoe4 MgChBFyY2egYVsCmCouySAI2t0F4C+UKFxlc0qQ+PWX+CCsThqUFgAJ0eUhfQD3qM//A6w4yNZn 39Rb7ie/mp X-Google-Smtp-Source: AA6agR6jYUFXEB9AaLeSTya0RD4JYtmCCwI3eqCyrwD3XSmzwAKBOhEBZgyZPk9b4nVqFmifDWHLQw== X-Received: by 2002:a17:902:e548:b0:16f:8df8:90d3 with SMTP id n8-20020a170902e54800b0016f8df890d3mr21001304plf.90.1660637568690; Tue, 16 Aug 2022 01:12:48 -0700 (PDT) Received: from amd_smc.dhcp.broadcom.net ([192.19.234.250]) by smtp.gmail.com with ESMTPSA id u5-20020a170902e80500b0016db7f49cc2sm8379611plg.115.2022.08.16.01.12.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 16 Aug 2022 01:12:47 -0700 (PDT) From: Kashyap Desai To: linux-rdma@vger.kernel.org Cc: jgg@nvidia.com, leonro@nvidia.com, selvin.xavier@broadcom.com, andrew.gospodarek@broadcom.com, Kashyap Desai Subject: [PATCH rdma-rc v1] RDMA/core: fix sg_to_page mapping for boundary condition Date: Tue, 16 Aug 2022 13:41:53 +0530 Message-Id: <20220816081153.1580612-1-kashyap.desai@broadcom.com> X-Mailer: git-send-email 2.27.0 MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org This issue frequently hits if AMD IOMMU is enabled. In case of 1MB data transfer, ib core is supposed to set 256 entries of 4K page size in MR page table. Because of the defect in ib_sg_to_pages, it breaks just after setting one entry. Memory region page table entries may find stale entries (or NULL if address is memset). Something like this - crash> x/32a 0xffff9cd9f7f84000 <<< -This looks like stale entries. Only first entry is valid ->>> 0xffff9cd9f7f84000:     0xfffffffffff00000      0x68d31000 0xffff9cd9f7f84010:     0x68d32000      0x68d33000 0xffff9cd9f7f84020:     0x68d34000      0x975c5000 0xffff9cd9f7f84030:     0x975c6000      0x975c7000 0xffff9cd9f7f84040:     0x975c8000      0x975c9000 0xffff9cd9f7f84050:     0x975ca000      0x975cb000 0xffff9cd9f7f84060:     0x975cc000      0x975cd000 0xffff9cd9f7f84070:     0x975ce000      0x975cf000 0xffff9cd9f7f84080:     0x0     0x0 0xffff9cd9f7f84090:     0x0     0x0 0xffff9cd9f7f840a0:     0x0     0x0 0xffff9cd9f7f840b0:     0x0     0x0 0xffff9cd9f7f840c0:     0x0     0x0 0xffff9cd9f7f840d0:     0x0     0x0 0xffff9cd9f7f840e0:     0x0     0x0 0xffff9cd9f7f840f0:     0x0     0x0 All addresses other than 0xfffffffffff00000 are stale entries. Once this kind of incorrect page entries are passed to the RDMA h/w, AMD IOMMU module detects the page fault whenever h/w tries to access addresses which are not actually populated by the ib stack correctly. Below prints are logged whenever this issue hits. bnxt_en 0000:21:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001e address=0x68d31000 flags=0x0050] ib_sg_to_pages function populates the correct page address in most of the cases, but there is one boundary condition which is not handled correctly. Boundary condition explained - Page addresses are not populated correctly if the dma buffer is mapped to the very last region of address space. One of the example - Whenever page_add is  0xfffffffffff00000  (Last 1MB section of the address space) and dma length is 1MB, end of the dma address = 0 (Derived from 0xfffffffffff00000 + 0x100000). use dma buffer length instead of end_dma_addr to fill page addresses. v0->v1 : Use first_page_off instead of page_off for readability Fix functional issue of not reseting first_page_off Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Signed-off-by: Kashyap Desai --- drivers/infiniband/core/verbs.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index e54b3f1b730e..5e72c44bac3a 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -2676,15 +2676,19 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, u64 dma_addr = sg_dma_address(sg) + sg_offset; u64 prev_addr = dma_addr; unsigned int dma_len = sg_dma_len(sg) - sg_offset; + unsigned int curr_dma_len = 0; + unsigned int first_page_off = 0; u64 end_dma_addr = dma_addr + dma_len; u64 page_addr = dma_addr & page_mask; + if (i == 0) + first_page_off = dma_addr - page_addr; /* * For the second and later elements, check whether either the * end of element i-1 or the start of element i is not aligned * on a page boundary. */ - if (i && (last_page_off != 0 || page_addr != dma_addr)) { + else if (last_page_off != 0 || page_addr != dma_addr) { /* Stop mapping if there is a gap. */ if (last_end_dma_addr != dma_addr) break; @@ -2708,8 +2712,10 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, } prev_addr = page_addr; next_page: + curr_dma_len += mr->page_size - first_page_off; page_addr += mr->page_size; - } while (page_addr < end_dma_addr); + first_page_off = 0; + } while (curr_dma_len < dma_len); mr->length += dma_len; last_end_dma_addr = end_dma_addr;