From patchwork Thu Jun 25 07:43:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Song Bao Hua (Barry Song)" X-Patchwork-Id: 11624721 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D160214F6 for ; Thu, 25 Jun 2020 07:47:34 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AAFF820675 for ; Thu, 25 Jun 2020 07:47:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="Rn+DFcBR" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AAFF820675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=hisilicon.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=T85s0KG/YNWoiTMwyG+F/cbiez7vXbIuA+l+oF7l5oU=; b=Rn+DFcBRUvIVaj8qP6sOdFg2/ x0abghPsLo4d84KsTGvnv0wzHVnyzHYY73QjZYloyPRj82Ls2VotBmiql+7KV5Rk7+Dorw3iQYlnm xT4RV3Zq0fc94GWfHoWVXPCMCBRWvqFPmgs7+CcU0ErJ77j6SIUs4BTRIAjClGpEPh6CFIj0Tlywm +LLvajbhiezNW007R04R46+0SlCbyDVpHPk2UYlWerBMUcUpVuLCcufS2zpIdlfUDZad6HoelfIlm ojeMomdc6uV9oW64zmp4xILso8lnlXpflKxHqIhYplVwdHO7gIcgcZY8lPz5GvtyEZtNud0Bw9Wql I3tEGaB3g==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1joMZt-0005m4-Tz; Thu, 25 Jun 2020 07:45:45 +0000 Received: from szxga06-in.huawei.com ([45.249.212.32] helo=huawei.com) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1joMZk-0005f5-O5 for linux-arm-kernel@lists.infradead.org; Thu, 25 Jun 2020 07:45:38 +0000 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.58]) by Forcepoint Email with ESMTP id B906B16F08BD6AF8730C; Thu, 25 Jun 2020 15:45:28 +0800 (CST) Received: from SWX921481.china.huawei.com (10.126.203.118) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.487.0; Thu, 25 Jun 2020 15:45:20 +0800 From: Barry Song To: , , , , , Subject: [PATCH v2 1/2] dma-direct: provide the ability to reserve per-numa CMA Date: Thu, 25 Jun 2020 19:43:29 +1200 Message-ID: <20200625074330.13668-2-song.bao.hua@hisilicon.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200625074330.13668-1-song.bao.hua@hisilicon.com> References: <20200625074330.13668-1-song.bao.hua@hisilicon.com> MIME-Version: 1.0 X-Originating-IP: [10.126.203.118] X-CFilter-Loop: Reflected X-Spam-Note: CRM114 invocation failed X-Spam-Score: -2.3 (--) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-2.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- 0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4) [45.249.212.32 listed in wl.mailspike.net] -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.32 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Barry Song , Steve Capper , linuxarm@huawei.com, linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org, Nicolas Saenz Julienne , Jonathan Cameron , Andrew Morton , Mike Rapoport , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org This is useful for at least two scenarios: 1. ARM64 smmu will get memory from local numa node, it can save its command queues and page tables locally. Tests show it can decrease dma_unmap latency at lot. For example, without this patch, smmu on node2 will get memory from node0 by calling dma_alloc_coherent(), typically, it has to wait for more than 560ns for the completion of CMD_SYNC in an empty command queue; with this patch, it needs 240ns only. 2. when we set iommu passthrough, drivers will get memory from CMA, local memory means much less latency. Cc: Jonathan Cameron Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Will Deacon Cc: Robin Murphy Cc: Ganapatrao Kulkarni Cc: Catalin Marinas Cc: Nicolas Saenz Julienne Cc: Steve Capper Cc: Andrew Morton Cc: Mike Rapoport Signed-off-by: Barry Song Reported-by: kernel test robot --- include/linux/dma-contiguous.h | 4 ++ kernel/dma/Kconfig | 10 ++++ kernel/dma/contiguous.c | 99 ++++++++++++++++++++++++++++++---- 3 files changed, 104 insertions(+), 9 deletions(-) diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h index 03f8e98e3bcc..278a80a40456 100644 --- a/include/linux/dma-contiguous.h +++ b/include/linux/dma-contiguous.h @@ -79,6 +79,8 @@ static inline void dma_contiguous_set_default(struct cma *cma) void dma_contiguous_reserve(phys_addr_t addr_limit); +void dma_pernuma_cma_reserve(void); + int __init dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, phys_addr_t limit, struct cma **res_cma, bool fixed); @@ -128,6 +130,8 @@ static inline void dma_contiguous_set_default(struct cma *cma) { } static inline void dma_contiguous_reserve(phys_addr_t limit) { } +static inline void dma_pernuma_cma_reserve(void) { } + static inline int dma_contiguous_reserve_area(phys_addr_t size, phys_addr_t base, phys_addr_t limit, struct cma **res_cma, bool fixed) diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index d006668c0027..aeb976b1d21c 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -104,6 +104,16 @@ config DMA_CMA if DMA_CMA comment "Default contiguous memory area size:" +config CMA_PERNUMA_SIZE_MBYTES + int "Size in Mega Bytes for per-numa CMA areas" + depends on NUMA + default 16 if ARM64 + default 0 + help + Defines the size (in MiB) of the per-numa memory area for Contiguous + Memory Allocator. Every numa node will get a separate CMA with this + size. If the size of 0 is selected, per-numa CMA is disabled. + config CMA_SIZE_MBYTES int "Size in Mega Bytes" depends on !CMA_SIZE_SEL_PERCENTAGE diff --git a/kernel/dma/contiguous.c b/kernel/dma/contiguous.c index 15bc5026c485..bcbd53aead93 100644 --- a/kernel/dma/contiguous.c +++ b/kernel/dma/contiguous.c @@ -30,7 +30,14 @@ #define CMA_SIZE_MBYTES 0 #endif +#ifdef CONFIG_CMA_PERNUMA_SIZE_MBYTES +#define CMA_SIZE_PERNUMA_MBYTES CONFIG_CMA_PERNUMA_SIZE_MBYTES +#else +#define CMA_SIZE_PERNUMA_MBYTES 0 +#endif + struct cma *dma_contiguous_default_area; +static struct cma *dma_contiguous_pernuma_area[MAX_NUMNODES]; /* * Default global CMA area size can be defined in kernel's .config. @@ -44,6 +51,8 @@ struct cma *dma_contiguous_default_area; */ static const phys_addr_t size_bytes __initconst = (phys_addr_t)CMA_SIZE_MBYTES * SZ_1M; +static const phys_addr_t pernuma_size_bytes __initconst = + (phys_addr_t)CMA_SIZE_PERNUMA_MBYTES * SZ_1M; static phys_addr_t size_cmdline __initdata = -1; static phys_addr_t base_cmdline __initdata; static phys_addr_t limit_cmdline __initdata; @@ -96,6 +105,33 @@ static inline __maybe_unused phys_addr_t cma_early_percent_memory(void) #endif +void __init dma_pernuma_cma_reserve(void) +{ + int nid; + + if (!pernuma_size_bytes || nr_online_nodes <= 1) + return; + + for_each_node_state(nid, N_ONLINE) { + int ret; + char name[20]; + + snprintf(name, sizeof(name), "pernuma%d", nid); + ret = cma_declare_contiguous_nid(0, pernuma_size_bytes, 0, 0, + 0, false, name, + &dma_contiguous_pernuma_area[nid], + nid); + if (ret) { + pr_warn("%s: reservation failed: err %d, node %d", __func__, + ret, nid); + continue; + } + + pr_debug("%s: reserved %llu MiB on node %d\n", __func__, + (unsigned long long)pernuma_size_bytes / SZ_1M, nid); + } +} + /** * dma_contiguous_reserve() - reserve area(s) for contiguous memory handling * @limit: End address of the reserved memory (optional, 0 for any). @@ -222,22 +258,31 @@ bool dma_release_from_contiguous(struct device *dev, struct page *pages, * @gfp: Allocation flags. * * This function allocates contiguous memory buffer for specified device. It - * tries to use device specific contiguous memory area if available, or the - * default global one. + * tries to use device specific contiguous memory area if available, or it + * tries to use per-numa cma, if the allocation fails, it will fallback to + * try default global one. * - * Note that it byapss one-page size of allocations from the global area as - * the addresses within one page are always contiguous, so there is no need - * to waste CMA pages for that kind; it also helps reduce fragmentations. + * Note that it bypass one-page size of allocations from the per-numa and + * global area as the addresses within one page are always contiguous, so + * there is no need to waste CMA pages for that kind; it also helps reduce + * fragmentations. */ struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp) { size_t count = size >> PAGE_SHIFT; struct page *page = NULL; struct cma *cma = NULL; + int nid = dev ? dev_to_node(dev) : NUMA_NO_NODE; + bool alloc_from_pernuma = false; if (dev && dev->cma_area) cma = dev->cma_area; - else if (count > 1) + else if ((nid != NUMA_NO_NODE) && dma_contiguous_pernuma_area[nid] + && !(gfp & (GFP_DMA | GFP_DMA32)) + && (count > 1)) { + cma = dma_contiguous_pernuma_area[nid]; + alloc_from_pernuma = true; + } else if (count > 1) cma = dma_contiguous_default_area; /* CMA can be used only in the context which permits sleeping */ @@ -246,6 +291,11 @@ struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp) size_t cma_align = min_t(size_t, align, CONFIG_CMA_ALIGNMENT); page = cma_alloc(cma, count, cma_align, gfp & __GFP_NOWARN); + + /* fall back to default cma if failed in per-numa cma */ + if (!page && alloc_from_pernuma) + page = cma_alloc(dma_contiguous_default_area, count, + cma_align, gfp & __GFP_NOWARN); } return page; @@ -264,9 +314,40 @@ struct page *dma_alloc_contiguous(struct device *dev, size_t size, gfp_t gfp) */ void dma_free_contiguous(struct device *dev, struct page *page, size_t size) { - if (!cma_release(dev_get_cma_area(dev), page, - PAGE_ALIGN(size) >> PAGE_SHIFT)) - __free_pages(page, get_order(size)); + /* if dev has its own cma, free page from there */ + if (dev && dev->cma_area) { + if (cma_release(dev->cma_area, page, PAGE_ALIGN(size) >> PAGE_SHIFT)) + return; + } else { + /* + * otherwise, page is from either per-numa cma or default cma + */ + int nid = dev ? dev_to_node(dev) : NUMA_NO_NODE; + + if (nid != NUMA_NO_NODE) { + int i; + + /* + * Literally we only need to call cma_release() on pernuma cma of + * node nid, howerver, a corner case is that users might write + * /sys/devices/pci-x/numa_node to change node to workaround firmware + * bug, so it might allocate memory from nodeA CMA, but free from nodeB + * CMA. + */ + for (i = 0; i < MAX_NUMNODES; i++) { + if (cma_release(dma_contiguous_pernuma_area[i], page, + PAGE_ALIGN(size) >> PAGE_SHIFT)) + return; + } + } + + if (cma_release(dma_contiguous_default_area, page, + PAGE_ALIGN(size) >> PAGE_SHIFT)) + return; + } + + /* not in any cma, free from buddy */ + __free_pages(page, get_order(size)); } /* From patchwork Thu Jun 25 07:43:30 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Song Bao Hua (Barry Song)" X-Patchwork-Id: 11624723 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 00EDD138C for ; Thu, 25 Jun 2020 07:47:48 +0000 (UTC) Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C968120675 for ; Thu, 25 Jun 2020 07:47:47 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="kDX06ol6" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C968120675 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=hisilicon.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=merlin.20170209; h=Sender:Content-Transfer-Encoding: Content-Type:Cc:List-Subscribe:List-Help:List-Post:List-Archive: List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To:Message-ID:Date: Subject:To:From:Reply-To:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Owner; bh=2A3e6I9RMz2alaTu+rdBKXlYpzhIi8bFjZxwSufv6fk=; b=kDX06ol68U0ebxN4P3gjaUxTu Yyb78npjj0mtQlM26OVwk5wRLvtgPu4zHURP798xgKZhtmENniFJWKCqFmOPUMd6V42FEtbOTAK9z h8OMdWb8W8s06MJRrhivYB3yF5IcfU1I0DknofcAQuvLp0Nnlxd/nsArnZz3HLpC3JYBtA5ZEJ7Pu UkMBL5uLlmi0EIxRraYAlEZg9rKVBi9ifcq1N2a+3Op/7+p1qU2DPTQJ2h6vsYNkKLXQxnobotUpx TP2fUTYrgIW/iwPiSUEjcsTUTQOy9CB3S5EANPtSKTRgq3aVxV4sR8lyrE40RW90ct1aKHxxXKZcy m9fRM0d+Q==; Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.92.3 #3 (Red Hat Linux)) id 1joMZy-0005n2-UG; Thu, 25 Jun 2020 07:45:50 +0000 Received: from szxga05-in.huawei.com ([45.249.212.191] helo=huawei.com) by merlin.infradead.org with esmtps (Exim 4.92.3 #3 (Red Hat Linux)) id 1joMZl-0005he-IQ for linux-arm-kernel@lists.infradead.org; Thu, 25 Jun 2020 07:45:39 +0000 Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id CC418F2672C0FCA43C08; Thu, 25 Jun 2020 15:45:33 +0800 (CST) Received: from SWX921481.china.huawei.com (10.126.203.118) by DGGEMS410-HUB.china.huawei.com (10.3.19.210) with Microsoft SMTP Server id 14.3.487.0; Thu, 25 Jun 2020 15:45:23 +0800 From: Barry Song To: , , , , , Subject: [PATCH v2 2/2] arm64: mm: reserve per-numa CMA after numa_init Date: Thu, 25 Jun 2020 19:43:30 +1200 Message-ID: <20200625074330.13668-3-song.bao.hua@hisilicon.com> X-Mailer: git-send-email 2.21.0.windows.1 In-Reply-To: <20200625074330.13668-1-song.bao.hua@hisilicon.com> References: <20200625074330.13668-1-song.bao.hua@hisilicon.com> MIME-Version: 1.0 X-Originating-IP: [10.126.203.118] X-CFilter-Loop: Reflected X-Spam-Note: CRM114 invocation failed X-Spam-Score: -2.3 (--) X-Spam-Report: SpamAssassin version 3.4.4 on merlin.infradead.org summary: Content analysis details: (-2.3 points) pts rule name description ---- ---------------------- -------------------------------------------------- -2.3 RCVD_IN_DNSWL_MED RBL: Sender listed at https://www.dnswl.org/, medium trust [45.249.212.191 listed in list.dnswl.org] -0.0 SPF_PASS SPF: sender matches SPF record -0.0 SPF_HELO_PASS SPF: HELO matches SPF record 0.0 RCVD_IN_MSPIKE_H4 RBL: Very Good reputation (+4) [45.249.212.191 listed in wl.mailspike.net] 0.0 RCVD_IN_MSPIKE_WL Mailspike good senders X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Barry Song , Steve Capper , linuxarm@huawei.com, linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org, linux-arm-kernel@lists.infradead.org, Andrew Morton , Mike Rapoport , Nicolas Saenz Julienne Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org Right now, smmu is using dma_alloc_coherent() to get memory to save queues and tables. Typically, on ARM64 server, there is a default CMA located at node0, which could be far away from node2, node3 etc. with this patch, smmu will get memory from local numa node to save command queues and page tables. that means dma_unmap latency will be shrunk much. Meanwhile, when iommu.passthrough is on, device drivers which call dma_ alloc_coherent() will also get local memory and avoid the travel between numa nodes. Cc: Christoph Hellwig Cc: Marek Szyprowski Cc: Will Deacon Cc: Robin Murphy Cc: Ganapatrao Kulkarni Cc: Catalin Marinas Cc: Nicolas Saenz Julienne Cc: Steve Capper Cc: Andrew Morton Cc: Mike Rapoport Signed-off-by: Barry Song --- arch/arm64/mm/init.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 1e93cfc7c47a..07d4d1fe7983 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -420,6 +420,8 @@ void __init bootmem_init(void) arm64_numa_init(); + dma_pernuma_cma_reserve(); + /* * must be done after arm64_numa_init() which calls numa_init() to * initialize node_online_map that gets used in hugetlb_cma_reserve()