Message ID | 20200628111251.19108-1-song.bao.hua@hisilicon.com (mailing list archive) |
---|---|
Headers | show |
Series | make dma_alloc_coherent NUMA-aware by per-NUMA CMA | expand |
> -----Original Message----- > From: Song Bao Hua (Barry Song) > Sent: Sunday, June 28, 2020 11:13 PM > To: hch@lst.de; m.szyprowski@samsung.com; robin.murphy@arm.com; > will@kernel.org; ganapatrao.kulkarni@cavium.com; > catalin.marinas@arm.com > Cc: iommu@lists.linux-foundation.org; Linuxarm <linuxarm@huawei.com>; > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org; Song Bao > Hua (Barry Song) <song.bao.hua@hisilicon.com> > Subject: [PATCH v3 0/2] make dma_alloc_coherent NUMA-aware by > per-NUMA CMA > > Ganapatrao Kulkarni has put some effort on making arm-smmu-v3 use local > memory to save command queues[1]. I also did similar job in patch > "iommu/arm-smmu-v3: allocate the memory of queues in local numa node" > [2] while not realizing Ganapatrao has done that before. > > But it seems it is much better to make dma_alloc_coherent() to be inherently > NUMA-aware on NUMA-capable systems. > > Right now, smmu is using dma_alloc_coherent() to get memory to save queues > and tables. Typically, on ARM64 server, there is a default CMA located at > node0, which could be far away from node2, node3 etc. > Saving queues and tables remotely will increase the latency of ARM SMMU > significantly. For example, when SMMU is at node2 and the default global > CMA is at node0, after sending a CMD_SYNC in an empty command queue, we > have to wait more than 550ns for the completion of the command > CMD_SYNC. > However, if we save them locally, we only need to wait for 240ns. > > with per-numa CMA, smmu will get memory from local numa node to save > command queues and page tables. that means dma_unmap latency will be > shrunk much. > > Meanwhile, when iommu.passthrough is on, device drivers which call dma_ > alloc_coherent() will also get local memory and avoid the travel between > numa nodes. > > [1] > https://lists.linuxfoundation.org/pipermail/iommu/2017-October/024455.htm > l > [2] https://www.spinics.net/lists/iommu/msg44767.html > > -v3: > * move to use page_to_nid() while freeing cma with respect to Robin's > comment, but this will only work after applying my below patch: > "mm/cma.c: use exact_nid true to fix possible per-numa cma leak" > https://marc.info/?l=linux-mm&m=159333034726647&w=2 > > * handle the case count <= 1 more properly according to Robin's > comment; > > * add pernuma_cma parameter to support dynamic setting of per-numa > cma size; > ideally we can leverage the CMA_SIZE_MBYTES, CMA_SIZE_PERCENTAGE and > "cma=" kernel parameter and avoid a new paramter separately for per- > numa cma. Practically, it is really too complicated considering the > below problems: > (1) if we leverage the size of default numa for per-numa, we have to > avoid creating two cma with same size in node0 since default cma is > probably on node0. > (2) default cma can consider the address limitation for old devices > while per-numa cma doesn't support GFP_DMA and GFP_DMA32. all > allocations with limitation flags will fallback to default one. > (3) hard to apply CMA_SIZE_PERCENTAGE to per-numa. it is hard to > decide if the percentage should apply to the whole memory size > or only apply to the memory size of a specific numa node. > (4) default cma size has CMA_SIZE_SEL_MIN and CMA_SIZE_SEL_MAX, it > makes things even more complicated to per-numa cma. > > I haven't figured out a good way to leverage the size of default cma > for per-numa cma. it seems a separate parameter for per-numa could > make life easier. > > * move dma_pernuma_cma_reserve() after hugetlb_cma_reserve() to > reuse the comment before hugetlb_cma_reserve() with respect to > Robin's comment > > -v2: > * fix some issues reported by kernel test robot > * fallback to default cma while allocation fails in per-numa cma > free memory properly > > Barry Song (2): > dma-direct: provide the ability to reserve per-numa CMA > arm64: mm: reserve per-numa CMA to localize coherent dma buffers > > .../admin-guide/kernel-parameters.txt | 9 ++ > arch/arm64/mm/init.c | 2 + > include/linux/dma-contiguous.h | 4 + > kernel/dma/Kconfig | 10 ++ > kernel/dma/contiguous.c | 98 > +++++++++++++++++-- > 5 files changed, 114 insertions(+), 9 deletions(-) Gentle ping :-) Thanks Barry