From patchwork Fri Dec 15 08:43:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Boris Brezillon X-Patchwork-Id: 13494130 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id B78D6C4332F for ; Fri, 15 Dec 2023 08:43:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=fMVMTpxGR/jK1qSjltUM1qEBSMIN5VOME9NTjgaJxj8=; b=lBB7TJgeyRvv0P tBqorPln7z4YynihZqXq10iDUOPxZ8C1YrIZ8sSrJtjX5/dGGOeSBcsx4I5OeS8QQV2RoFBZNkP+p rNYII3YWmPxQcH7XH5IAejF68Jw0+Tosz+ok9c9ufoxzY24SLeo/qWd6xCAyJ5JPx/62TEoWg42wI 7o1BXgcIGLn7RSl4KkJU0ADr7VNmJdngERxO4H/vx6yiGB7Q6pFpe4P8HrM7X5OEW1Y8nFJe8scpd U+f7Pmg50Bdg3hbFgHV8xIyVeYOPnkEwBOMogjU+5h5UHw5kTllBjvmDzRCUBZr6N9oZ0zHrqqxSv ZXOW+65XxnHQGsIaNmfQ==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.96 #2 (Red Hat Linux)) id 1rE3mr-002Vfd-2C; Fri, 15 Dec 2023 08:43:13 +0000 Received: from madrid.collaboradmins.com ([46.235.227.194]) by bombadil.infradead.org with esmtps (Exim 4.96 #2 (Red Hat Linux)) id 1rE3mo-002VeK-1i for linux-arm-kernel@lists.infradead.org; Fri, 15 Dec 2023 08:43:12 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=collabora.com; s=mail; t=1702629785; bh=aN+i4SK/dFKTtQwnSXWUhbsSxj1u5+1d4N+n2o+PpC8=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=w3bX4rEroezOdZQLAvanCzXWPVpWfN9WzVNLzs+Rzn31qj+8gL9AOcnIHspPRLWXP pe6hbDg4RQRg7vVu8Lc9QtbbphOIuZ/USMH1DpqnMmkrPABxwfE7iCFLbT1nr+qjE0 MpHgHUVlIxcevm3Sv8l3L1vh2s4jn6pFeU8Wlb+Nb8nBBb2lOyczIKO1XN3Jwds8rL 9cFbEvRiAR9vN7LEcWscHPeimOZdbNp4UvYxJ9bVp1Xp1EsQrnn4YtNTqkdmC3wKJE aMRIdUjYWi9GbSW7jW7+CW4D1qLgv5zMB033o0NQ0ksB6Q5WFdybus8JZ1UKCxkdgj wB1mfEC8dJC2g== Received: from localhost.localdomain (cola.collaboradmins.com [195.201.22.229]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) (Authenticated sender: bbrezillon) by madrid.collaboradmins.com (Postfix) with ESMTPSA id D005937813DA; Fri, 15 Dec 2023 08:43:04 +0000 (UTC) From: Boris Brezillon To: Joerg Roedel , iommu@lists.linux.dev, Will Deacon , Robin Murphy , linux-arm-kernel@lists.infradead.org Cc: Rob Clark , Gaurav Kohli , Steven Price , kernel@collabora.com, Jason Gunthorpe , Boris Brezillon Subject: [PATCH v4 2/2] iommu: Extend LPAE page table format to support custom allocators Date: Fri, 15 Dec 2023 09:43:02 +0100 Message-ID: <20231215084302.2484163-2-boris.brezillon@collabora.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20231215084302.2484163-1-boris.brezillon@collabora.com> References: <20231215084302.2484163-1-boris.brezillon@collabora.com> MIME-Version: 1.0 X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20231215_004310_715342_040DF2EB X-CRM114-Status: GOOD ( 23.38 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org We need that in order to implement the VM_BIND ioctl in the GPU driver targeting new Mali GPUs. VM_BIND is about executing MMU map/unmap requests asynchronously, possibly after waiting for external dependencies encoded as dma_fences. We intend to use the drm_sched framework to automate the dependency tracking and VM job dequeuing logic, but this comes with its own set of constraints, one of them being the fact we are not allowed to allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this sort of deadlocks: - VM_BIND map job needs to allocate a page table to map some memory to the VM. No memory available, so kswapd is kicked - GPU driver shrinker backend ends up waiting on the fence attached to the VM map job or any other job fence depending on this VM operation. With custom allocators, we will be able to pre-reserve enough pages to guarantee the map/unmap operations we queued will take place without going through the system allocator. But we can also optimize allocation/reservation by not free-ing pages immediately, so any upcoming page table allocation requests can be serviced by some free page table pool kept at the driver level. I might also be valuable for other aspects of GPU and similar use-cases, like fine-grained memory accounting and resource limiting. Signed-off-by: Boris Brezillon Reviewed-by: Steven Price Reviewed-by: Robin Murphy Reviewed-by: Gaurav Kohli Tested-by: Gaurav Kohli --- v4: - Add Gaurav's R-b/T-b v3: - Don't pass __GFP_ZERO to the custom ->alloc() hook. Returning zeroed mem is partof the agreement between the io-pgtable and its user - Add Robin R-b v2: - Add Steven R-b - Expand on possible use-cases for custom allocators --- drivers/iommu/io-pgtable-arm.c | 55 ++++++++++++++++++++++++---------- 1 file changed, 39 insertions(+), 16 deletions(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 72dcdd468cf3..f7828a7aad41 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages) } static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, - struct io_pgtable_cfg *cfg) + struct io_pgtable_cfg *cfg, + void *cookie) { struct device *dev = cfg->iommu_dev; int order = get_order(size); - struct page *p; dma_addr_t dma; void *pages; VM_BUG_ON((gfp & __GFP_HIGHMEM)); - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); - if (!p) + + if (cfg->alloc) { + pages = cfg->alloc(cookie, size, gfp); + } else { + struct page *p; + + p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); + pages = p ? page_address(p) : NULL; + } + + if (!pages) return NULL; - pages = page_address(p); if (!cfg->coherent_walk) { dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE); if (dma_mapping_error(dev, dma)) @@ -220,18 +228,28 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, out_unmap: dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n"); dma_unmap_single(dev, dma, size, DMA_TO_DEVICE); + out_free: - __free_pages(p, order); + if (cfg->free) + cfg->free(cookie, pages, size); + else + free_pages((unsigned long)pages, order); + return NULL; } static void __arm_lpae_free_pages(void *pages, size_t size, - struct io_pgtable_cfg *cfg) + struct io_pgtable_cfg *cfg, + void *cookie) { if (!cfg->coherent_walk) dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages), size, DMA_TO_DEVICE); - free_pages((unsigned long)pages, get_order(size)); + + if (cfg->free) + cfg->free(cookie, pages, size); + else + free_pages((unsigned long)pages, get_order(size)); } static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries, @@ -373,13 +391,13 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, /* Grab a pointer to the next level */ pte = READ_ONCE(*ptep); if (!pte) { - cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg); + cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg, data->iop.cookie); if (!cptep) return -ENOMEM; pte = arm_lpae_install_table(cptep, ptep, 0, data); if (pte) - __arm_lpae_free_pages(cptep, tblsz, cfg); + __arm_lpae_free_pages(cptep, tblsz, cfg, data->iop.cookie); } else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) { __arm_lpae_sync_pte(ptep, 1, cfg); } @@ -524,7 +542,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl, __arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data)); } - __arm_lpae_free_pages(start, table_size, &data->iop.cfg); + __arm_lpae_free_pages(start, table_size, &data->iop.cfg, data->iop.cookie); } static void arm_lpae_free_pgtable(struct io_pgtable *iop) @@ -552,7 +570,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) return 0; - tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg); + tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg, data->iop.cookie); if (!tablep) return 0; /* Bytes unmapped */ @@ -575,7 +593,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, pte = arm_lpae_install_table(tablep, ptep, blk_pte, data); if (pte != blk_pte) { - __arm_lpae_free_pages(tablep, tablesz, cfg); + __arm_lpae_free_pages(tablep, tablesz, cfg, data->iop.cookie); /* * We may race against someone unmapping another part of this * block, but anything else is invalid. We can't misinterpret @@ -882,7 +900,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) /* Looking good; allocate a pgd */ data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), - GFP_KERNEL, cfg); + GFP_KERNEL, cfg, cookie); if (!data->pgd) goto out_free_data; @@ -984,7 +1002,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie) /* Allocate pgd pages */ data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), - GFP_KERNEL, cfg); + GFP_KERNEL, cfg, cookie); if (!data->pgd) goto out_free_data; @@ -1059,7 +1077,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)); data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL, - cfg); + cfg, cookie); if (!data->pgd) goto out_free_data; @@ -1080,26 +1098,31 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) } struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s1_init_fns = { + .caps = IO_PGTABLE_CAP_CUSTOM_ALLOCATOR, .alloc = arm_64_lpae_alloc_pgtable_s1, .free = arm_lpae_free_pgtable, }; struct io_pgtable_init_fns io_pgtable_arm_64_lpae_s2_init_fns = { + .caps = IO_PGTABLE_CAP_CUSTOM_ALLOCATOR, .alloc = arm_64_lpae_alloc_pgtable_s2, .free = arm_lpae_free_pgtable, }; struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s1_init_fns = { + .caps = IO_PGTABLE_CAP_CUSTOM_ALLOCATOR, .alloc = arm_32_lpae_alloc_pgtable_s1, .free = arm_lpae_free_pgtable, }; struct io_pgtable_init_fns io_pgtable_arm_32_lpae_s2_init_fns = { + .caps = IO_PGTABLE_CAP_CUSTOM_ALLOCATOR, .alloc = arm_32_lpae_alloc_pgtable_s2, .free = arm_lpae_free_pgtable, }; struct io_pgtable_init_fns io_pgtable_arm_mali_lpae_init_fns = { + .caps = IO_PGTABLE_CAP_CUSTOM_ALLOCATOR, .alloc = arm_mali_lpae_alloc_pgtable, .free = arm_lpae_free_pgtable, };