Message ID | 20230809121744.2341454-3-boris.brezillon@collabora.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | iommu: Allow passing custom allocators to pgtable drivers | expand |
On Wed, Aug 09, 2023 at 02:17:44PM +0200, Boris Brezillon wrote: > We need that in order to implement the VM_BIND ioctl in the GPU driver > targeting new Mali GPUs. > > VM_BIND is about executing MMU map/unmap requests asynchronously, > possibly after waiting for external dependencies encoded as dma_fences. > We intend to use the drm_sched framework to automate the dependency > tracking and VM job dequeuing logic, but this comes with its own set > of constraints, one of them being the fact we are not allowed to > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this > sort of deadlocks: > > - VM_BIND map job needs to allocate a page table to map some memory > to the VM. No memory available, so kswapd is kicked > - GPU driver shrinker backend ends up waiting on the fence attached to > the VM map job or any other job fence depending on this VM operation. > > With custom allocators, we will be able to pre-reserve enough pages to > guarantee the map/unmap operations we queued will take place without > going through the system allocator. But we can also optimize > allocation/reservation by not free-ing pages immediately, so any > upcoming page table allocation requests can be serviced by some free > page table pool kept at the driver level. > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++----------- > drivers/iommu/io-pgtable.c | 12 ++++++++ > 2 files changed, 46 insertions(+), 16 deletions(-) > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 72dcdd468cf3..c5c04f0106f3 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages) > } > > static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, > - struct io_pgtable_cfg *cfg) > + struct io_pgtable_cfg *cfg, > + void *cookie) > { > struct device *dev = cfg->iommu_dev; > int order = get_order(size); > - struct page *p; > dma_addr_t dma; > void *pages; > > VM_BUG_ON((gfp & __GFP_HIGHMEM)); > - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); > - if (!p) > + > + if (cfg->alloc) { > + pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO); > + } else { > + struct page *p; > + > + p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); > + pages = p ? page_address(p) : NULL; Hmm, so the reason we pass the order is because the pgd may have additional alignment requirements but that's not passed back to your new allocation hook. Does it just return naturally aligned allocations? If you don't care about the pgd (since it's not something that's going to be freed and reallocated during the lifetime of the page-table), then perhaps it would be cleaner to pass in a 'struct kmem_cache' for the non-pgd pages when configuring the page-table, rather than override the allocation function wholesale? I think that would also mean that all the lowmem assumptions in the code (e.g. use of virt_to_phys()) would be a little less fragile. Will
On Wed, 9 Aug 2023 15:47:21 +0100 Will Deacon <will@kernel.org> wrote: > On Wed, Aug 09, 2023 at 02:17:44PM +0200, Boris Brezillon wrote: > > We need that in order to implement the VM_BIND ioctl in the GPU driver > > targeting new Mali GPUs. > > > > VM_BIND is about executing MMU map/unmap requests asynchronously, > > possibly after waiting for external dependencies encoded as dma_fences. > > We intend to use the drm_sched framework to automate the dependency > > tracking and VM job dequeuing logic, but this comes with its own set > > of constraints, one of them being the fact we are not allowed to > > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this > > sort of deadlocks: > > > > - VM_BIND map job needs to allocate a page table to map some memory > > to the VM. No memory available, so kswapd is kicked > > - GPU driver shrinker backend ends up waiting on the fence attached to > > the VM map job or any other job fence depending on this VM operation. > > > > With custom allocators, we will be able to pre-reserve enough pages to > > guarantee the map/unmap operations we queued will take place without > > going through the system allocator. But we can also optimize > > allocation/reservation by not free-ing pages immediately, so any > > upcoming page table allocation requests can be serviced by some free > > page table pool kept at the driver level. > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > --- > > drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++----------- > > drivers/iommu/io-pgtable.c | 12 ++++++++ > > 2 files changed, 46 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > > index 72dcdd468cf3..c5c04f0106f3 100644 > > --- a/drivers/iommu/io-pgtable-arm.c > > +++ b/drivers/iommu/io-pgtable-arm.c > > @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages) > > } > > > > static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, > > - struct io_pgtable_cfg *cfg) > > + struct io_pgtable_cfg *cfg, > > + void *cookie) > > { > > struct device *dev = cfg->iommu_dev; > > int order = get_order(size); > > - struct page *p; > > dma_addr_t dma; > > void *pages; > > > > VM_BUG_ON((gfp & __GFP_HIGHMEM)); > > - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); > > - if (!p) > > + > > + if (cfg->alloc) { > > + pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO); > > + } else { > > + struct page *p; > > + > > + p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); > > + pages = p ? page_address(p) : NULL; > > Hmm, so the reason we pass the order is because the pgd may have > additional alignment requirements but that's not passed back to your new > allocation hook. Does it just return naturally aligned allocations? So, the assumption was that the custom allocator should be aware of these alignment constraints. Like, if size > min_granule_sz, then order equal get_order(size). Right now, I reject any size that's not SZ_4K, because, given the VA-space and the pgtable config, I know I'll always end up with 4 levels and the pgd will fit in a 4k page. But if we ever decide we want to support 64k granule or a VA space that's smaller, we'll adjust the custom allocator accordingly. I'm not sure we discussed this specific aspect with Robin, but his point was that the user passing a custom allocator should be aware of the page table format and all the allocation constraints that come with it. > > If you don't care about the pgd (since it's not something that's going > to be freed and reallocated during the lifetime of the page-table), then > perhaps it would be cleaner to pass in a 'struct kmem_cache' for the > non-pgd pages when configuring the page-table, rather than override the > allocation function wholesale? I'd be fine with that, but I wasn't sure every one would be okay to use a kmem_cache as the page caching mechanism. I know Rob was intending to use a custom allocator with the MSM MMU to provide the same sort of caching for the Adreno GPU driver, so it might be good to have his opinion on that. > I think that would also mean that all the > lowmem assumptions in the code (e.g. use of virt_to_phys()) would be a > little less fragile. Well, that would only help if a custom kmem_cache is used, unless you want to generalize the kmem_cache approach and force it to all io-pgtable-arm users, in which case I probably don't need to pass a custom kmem_cache in the first place (mine has nothing special).
On Wed, 9 Aug 2023 17:10:17 +0200 Boris Brezillon <boris.brezillon@collabora.com> wrote: > On Wed, 9 Aug 2023 15:47:21 +0100 > Will Deacon <will@kernel.org> wrote: > > > On Wed, Aug 09, 2023 at 02:17:44PM +0200, Boris Brezillon wrote: > > > We need that in order to implement the VM_BIND ioctl in the GPU driver > > > targeting new Mali GPUs. > > > > > > VM_BIND is about executing MMU map/unmap requests asynchronously, > > > possibly after waiting for external dependencies encoded as dma_fences. > > > We intend to use the drm_sched framework to automate the dependency > > > tracking and VM job dequeuing logic, but this comes with its own set > > > of constraints, one of them being the fact we are not allowed to > > > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this > > > sort of deadlocks: > > > > > > - VM_BIND map job needs to allocate a page table to map some memory > > > to the VM. No memory available, so kswapd is kicked > > > - GPU driver shrinker backend ends up waiting on the fence attached to > > > the VM map job or any other job fence depending on this VM operation. > > > > > > With custom allocators, we will be able to pre-reserve enough pages to > > > guarantee the map/unmap operations we queued will take place without > > > going through the system allocator. But we can also optimize > > > allocation/reservation by not free-ing pages immediately, so any > > > upcoming page table allocation requests can be serviced by some free > > > page table pool kept at the driver level. > > > > > > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > > > --- > > > drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++----------- > > > drivers/iommu/io-pgtable.c | 12 ++++++++ > > > 2 files changed, 46 insertions(+), 16 deletions(-) > > > > > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > > > index 72dcdd468cf3..c5c04f0106f3 100644 > > > --- a/drivers/iommu/io-pgtable-arm.c > > > +++ b/drivers/iommu/io-pgtable-arm.c > > > @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages) > > > } > > > > > > static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, > > > - struct io_pgtable_cfg *cfg) > > > + struct io_pgtable_cfg *cfg, > > > + void *cookie) > > > { > > > struct device *dev = cfg->iommu_dev; > > > int order = get_order(size); > > > - struct page *p; > > > dma_addr_t dma; > > > void *pages; > > > > > > VM_BUG_ON((gfp & __GFP_HIGHMEM)); > > > - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); > > > - if (!p) > > > + > > > + if (cfg->alloc) { > > > + pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO); > > > + } else { > > > + struct page *p; > > > + > > > + p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); > > > + pages = p ? page_address(p) : NULL; > > > > Hmm, so the reason we pass the order is because the pgd may have > > additional alignment requirements but that's not passed back to your new > > allocation hook. Does it just return naturally aligned allocations? > > So, the assumption was that the custom allocator should be aware of > these alignment constraints. Like, if size > min_granule_sz, then order > equal get_order(size). > > Right now, I reject any size that's not SZ_4K, because, given the > VA-space and the pgtable config, I know I'll always end up with 4 > levels and the pgd will fit in a 4k page. But if we ever decide we want > to support 64k granule or a VA space that's smaller, we'll adjust the > custom allocator accordingly. > > I'm not sure we discussed this specific aspect with Robin, but his > point was that the user passing a custom allocator should be aware of > the page table format and all the allocation constraints that come with > it. > > > > > If you don't care about the pgd (since it's not something that's going > > to be freed and reallocated during the lifetime of the page-table), then > > perhaps it would be cleaner to pass in a 'struct kmem_cache' for the > > non-pgd pages when configuring the page-table, rather than override the > > allocation function wholesale? > > I'd be fine with that, but I wasn't sure every one would be okay to use > a kmem_cache as the page caching mechanism. My bad, I still need the custom allocator hooks so I can pre-allocate pages and make sure no allocation happens in the page table update path (dma-signaling path where no blocking allocations are allowed). I might be wrong, but I don't think kmem_cache provides such a reservation mechanism. > I know Rob was intending to > use a custom allocator with the MSM MMU to provide the same sort of > caching for the Adreno GPU driver, so it might be good to have his > opinion on that. > > > I think that would also mean that all the > > lowmem assumptions in the code (e.g. use of virt_to_phys()) would be a > > little less fragile. > > Well, that would only help if a custom kmem_cache is used, unless you > want to generalize the kmem_cache approach and force it to all > io-pgtable-arm users, in which case I probably don't need to pass a > custom kmem_cache in the first place (mine has nothing special).
On 09/08/2023 1:17 pm, Boris Brezillon wrote: > We need that in order to implement the VM_BIND ioctl in the GPU driver > targeting new Mali GPUs. > > VM_BIND is about executing MMU map/unmap requests asynchronously, > possibly after waiting for external dependencies encoded as dma_fences. > We intend to use the drm_sched framework to automate the dependency > tracking and VM job dequeuing logic, but this comes with its own set > of constraints, one of them being the fact we are not allowed to > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this > sort of deadlocks: > > - VM_BIND map job needs to allocate a page table to map some memory > to the VM. No memory available, so kswapd is kicked > - GPU driver shrinker backend ends up waiting on the fence attached to > the VM map job or any other job fence depending on this VM operation. > > With custom allocators, we will be able to pre-reserve enough pages to > guarantee the map/unmap operations we queued will take place without > going through the system allocator. But we can also optimize > allocation/reservation by not free-ing pages immediately, so any > upcoming page table allocation requests can be serviced by some free > page table pool kept at the driver level. We should bear in mind it's also potentially valuable for other aspects of GPU and similar use-cases, like fine-grained memory accounting and resource limiting. That's a significant factor in this approach vs. internal caching schemes that could only solve the specific reclaim concern. > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> > --- > drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++----------- > drivers/iommu/io-pgtable.c | 12 ++++++++ > 2 files changed, 46 insertions(+), 16 deletions(-) > > diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c > index 72dcdd468cf3..c5c04f0106f3 100644 > --- a/drivers/iommu/io-pgtable-arm.c > +++ b/drivers/iommu/io-pgtable-arm.c > @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages) > } > > static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, > - struct io_pgtable_cfg *cfg) > + struct io_pgtable_cfg *cfg, > + void *cookie) > { > struct device *dev = cfg->iommu_dev; > int order = get_order(size); > - struct page *p; > dma_addr_t dma; > void *pages; > > VM_BUG_ON((gfp & __GFP_HIGHMEM)); > - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); > - if (!p) > + > + if (cfg->alloc) { > + pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO); I think it should be a fundamental requirement of the interface that an allocator always returns zeroed pages, since there's no obvious use-case for it ever not doing so. Ideally I'd like to not pass a gfp value at all, given the intent that a custom allocator will typically be something *other* than alloc_pages() by necessity, but it's conceivable that contextual flags like GFP_ATOMIC and __GFP_NOWARN could still be meaningful, so ultimately it doesn't really seem worthwhile trying to re-encode them differently just for the sake of it. > + } else { > + struct page *p; > + > + p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); > + pages = p ? page_address(p) : NULL; > + } > + > + if (!pages) > return NULL; > > - pages = page_address(p); > if (!cfg->coherent_walk) { > dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE); > if (dma_mapping_error(dev, dma)) > @@ -220,18 +228,28 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, > out_unmap: > dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n"); > dma_unmap_single(dev, dma, size, DMA_TO_DEVICE); > + > out_free: > - __free_pages(p, order); > + if (cfg->free) > + cfg->free(cookie, pages, size); > + else > + free_pages((unsigned long)pages, order); > + > return NULL; > } > > static void __arm_lpae_free_pages(void *pages, size_t size, > - struct io_pgtable_cfg *cfg) > + struct io_pgtable_cfg *cfg, > + void *cookie) > { > if (!cfg->coherent_walk) > dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages), > size, DMA_TO_DEVICE); > - free_pages((unsigned long)pages, get_order(size)); > + > + if (cfg->free) > + cfg->free(cookie, pages, size); > + else > + free_pages((unsigned long)pages, get_order(size)); > } > > static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries, > @@ -373,13 +391,13 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, > /* Grab a pointer to the next level */ > pte = READ_ONCE(*ptep); > if (!pte) { > - cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg); > + cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg, data->iop.cookie); > if (!cptep) > return -ENOMEM; > > pte = arm_lpae_install_table(cptep, ptep, 0, data); > if (pte) > - __arm_lpae_free_pages(cptep, tblsz, cfg); > + __arm_lpae_free_pages(cptep, tblsz, cfg, data->iop.cookie); > } else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) { > __arm_lpae_sync_pte(ptep, 1, cfg); > } > @@ -524,7 +542,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl, > __arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data)); > } > > - __arm_lpae_free_pages(start, table_size, &data->iop.cfg); > + __arm_lpae_free_pages(start, table_size, &data->iop.cfg, data->iop.cookie); > } > > static void arm_lpae_free_pgtable(struct io_pgtable *iop) > @@ -552,7 +570,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, > if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) > return 0; > > - tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg); > + tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg, data->iop.cookie); > if (!tablep) > return 0; /* Bytes unmapped */ > > @@ -575,7 +593,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, > > pte = arm_lpae_install_table(tablep, ptep, blk_pte, data); > if (pte != blk_pte) { > - __arm_lpae_free_pages(tablep, tablesz, cfg); > + __arm_lpae_free_pages(tablep, tablesz, cfg, data->iop.cookie); > /* > * We may race against someone unmapping another part of this > * block, but anything else is invalid. We can't misinterpret > @@ -882,7 +900,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) > > /* Looking good; allocate a pgd */ > data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), > - GFP_KERNEL, cfg); > + GFP_KERNEL, cfg, cookie); > if (!data->pgd) > goto out_free_data; > > @@ -984,7 +1002,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie) > > /* Allocate pgd pages */ > data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), > - GFP_KERNEL, cfg); > + GFP_KERNEL, cfg, cookie); > if (!data->pgd) > goto out_free_data; > > @@ -1059,7 +1077,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) > << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)); > > data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL, > - cfg); > + cfg, cookie); > if (!data->pgd) > goto out_free_data; > > diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c > index f4caf630638a..e273c18ae22b 100644 > --- a/drivers/iommu/io-pgtable.c > +++ b/drivers/iommu/io-pgtable.c > @@ -47,6 +47,18 @@ static int check_custom_allocator(enum io_pgtable_fmt fmt, > if (!cfg->alloc) > return 0; > > + switch (fmt) { > + case ARM_32_LPAE_S1: > + case ARM_32_LPAE_S2: > + case ARM_64_LPAE_S1: > + case ARM_64_LPAE_S2: > + case ARM_MALI_LPAE: > + return 0; I remain not entirely convinced by the value of this, but could it at least be done in a more scalable manner like some kind of flag provided by the format itself? Thanks, Robin. > + > + default: > + break; > + } > + > return -EINVAL; > } >
On Wed, 20 Sep 2023 17:42:01 +0100 Robin Murphy <robin.murphy@arm.com> wrote: > On 09/08/2023 1:17 pm, Boris Brezillon wrote: > > We need that in order to implement the VM_BIND ioctl in the GPU driver > > targeting new Mali GPUs. > > > > VM_BIND is about executing MMU map/unmap requests asynchronously, > > possibly after waiting for external dependencies encoded as dma_fences. > > We intend to use the drm_sched framework to automate the dependency > > tracking and VM job dequeuing logic, but this comes with its own set > > of constraints, one of them being the fact we are not allowed to > > allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this > > sort of deadlocks: > > > > - VM_BIND map job needs to allocate a page table to map some memory > > to the VM. No memory available, so kswapd is kicked > > - GPU driver shrinker backend ends up waiting on the fence attached to > > the VM map job or any other job fence depending on this VM operation. > > > > With custom allocators, we will be able to pre-reserve enough pages to > > guarantee the map/unmap operations we queued will take place without > > going through the system allocator. But we can also optimize > > allocation/reservation by not free-ing pages immediately, so any > > upcoming page table allocation requests can be serviced by some free > > page table pool kept at the driver level. > > We should bear in mind it's also potentially valuable for other aspects > of GPU and similar use-cases, like fine-grained memory accounting and > resource limiting. That's a significant factor in this approach vs. > internal caching schemes that could only solve the specific reclaim concern. I mentioned these other cases in v2. Let me know if that's not detailed enough. > > diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c > > index f4caf630638a..e273c18ae22b 100644 > > --- a/drivers/iommu/io-pgtable.c > > +++ b/drivers/iommu/io-pgtable.c > > @@ -47,6 +47,18 @@ static int check_custom_allocator(enum io_pgtable_fmt fmt, > > if (!cfg->alloc) > > return 0; > > > > + switch (fmt) { > > + case ARM_32_LPAE_S1: > > + case ARM_32_LPAE_S2: > > + case ARM_64_LPAE_S1: > > + case ARM_64_LPAE_S2: > > + case ARM_MALI_LPAE: > > + return 0; > > I remain not entirely convinced by the value of this, but could it at > least be done in a more scalable manner like some kind of flag provided > by the format itself? I added a caps flag to io_pgtable_init_fns in v2. Feels a bit weird to add a field that's not a function pointer in a struct that's prefixed with _fns (which I guess stands for _functions), but oh well. Regards, Boris
diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 72dcdd468cf3..c5c04f0106f3 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -188,20 +188,28 @@ static dma_addr_t __arm_lpae_dma_addr(void *pages) } static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, - struct io_pgtable_cfg *cfg) + struct io_pgtable_cfg *cfg, + void *cookie) { struct device *dev = cfg->iommu_dev; int order = get_order(size); - struct page *p; dma_addr_t dma; void *pages; VM_BUG_ON((gfp & __GFP_HIGHMEM)); - p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); - if (!p) + + if (cfg->alloc) { + pages = cfg->alloc(cookie, size, gfp | __GFP_ZERO); + } else { + struct page *p; + + p = alloc_pages_node(dev_to_node(dev), gfp | __GFP_ZERO, order); + pages = p ? page_address(p) : NULL; + } + + if (!pages) return NULL; - pages = page_address(p); if (!cfg->coherent_walk) { dma = dma_map_single(dev, pages, size, DMA_TO_DEVICE); if (dma_mapping_error(dev, dma)) @@ -220,18 +228,28 @@ static void *__arm_lpae_alloc_pages(size_t size, gfp_t gfp, out_unmap: dev_err(dev, "Cannot accommodate DMA translation for IOMMU page tables\n"); dma_unmap_single(dev, dma, size, DMA_TO_DEVICE); + out_free: - __free_pages(p, order); + if (cfg->free) + cfg->free(cookie, pages, size); + else + free_pages((unsigned long)pages, order); + return NULL; } static void __arm_lpae_free_pages(void *pages, size_t size, - struct io_pgtable_cfg *cfg) + struct io_pgtable_cfg *cfg, + void *cookie) { if (!cfg->coherent_walk) dma_unmap_single(cfg->iommu_dev, __arm_lpae_dma_addr(pages), size, DMA_TO_DEVICE); - free_pages((unsigned long)pages, get_order(size)); + + if (cfg->free) + cfg->free(cookie, pages, size); + else + free_pages((unsigned long)pages, get_order(size)); } static void __arm_lpae_sync_pte(arm_lpae_iopte *ptep, int num_entries, @@ -373,13 +391,13 @@ static int __arm_lpae_map(struct arm_lpae_io_pgtable *data, unsigned long iova, /* Grab a pointer to the next level */ pte = READ_ONCE(*ptep); if (!pte) { - cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg); + cptep = __arm_lpae_alloc_pages(tblsz, gfp, cfg, data->iop.cookie); if (!cptep) return -ENOMEM; pte = arm_lpae_install_table(cptep, ptep, 0, data); if (pte) - __arm_lpae_free_pages(cptep, tblsz, cfg); + __arm_lpae_free_pages(cptep, tblsz, cfg, data->iop.cookie); } else if (!cfg->coherent_walk && !(pte & ARM_LPAE_PTE_SW_SYNC)) { __arm_lpae_sync_pte(ptep, 1, cfg); } @@ -524,7 +542,7 @@ static void __arm_lpae_free_pgtable(struct arm_lpae_io_pgtable *data, int lvl, __arm_lpae_free_pgtable(data, lvl + 1, iopte_deref(pte, data)); } - __arm_lpae_free_pages(start, table_size, &data->iop.cfg); + __arm_lpae_free_pages(start, table_size, &data->iop.cfg, data->iop.cookie); } static void arm_lpae_free_pgtable(struct io_pgtable *iop) @@ -552,7 +570,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, if (WARN_ON(lvl == ARM_LPAE_MAX_LEVELS)) return 0; - tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg); + tablep = __arm_lpae_alloc_pages(tablesz, GFP_ATOMIC, cfg, data->iop.cookie); if (!tablep) return 0; /* Bytes unmapped */ @@ -575,7 +593,7 @@ static size_t arm_lpae_split_blk_unmap(struct arm_lpae_io_pgtable *data, pte = arm_lpae_install_table(tablep, ptep, blk_pte, data); if (pte != blk_pte) { - __arm_lpae_free_pages(tablep, tablesz, cfg); + __arm_lpae_free_pages(tablep, tablesz, cfg, data->iop.cookie); /* * We may race against someone unmapping another part of this * block, but anything else is invalid. We can't misinterpret @@ -882,7 +900,7 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) /* Looking good; allocate a pgd */ data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), - GFP_KERNEL, cfg); + GFP_KERNEL, cfg, cookie); if (!data->pgd) goto out_free_data; @@ -984,7 +1002,7 @@ arm_64_lpae_alloc_pgtable_s2(struct io_pgtable_cfg *cfg, void *cookie) /* Allocate pgd pages */ data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), - GFP_KERNEL, cfg); + GFP_KERNEL, cfg, cookie); if (!data->pgd) goto out_free_data; @@ -1059,7 +1077,7 @@ arm_mali_lpae_alloc_pgtable(struct io_pgtable_cfg *cfg, void *cookie) << ARM_LPAE_MAIR_ATTR_SHIFT(ARM_LPAE_MAIR_ATTR_IDX_DEV)); data->pgd = __arm_lpae_alloc_pages(ARM_LPAE_PGD_SIZE(data), GFP_KERNEL, - cfg); + cfg, cookie); if (!data->pgd) goto out_free_data; diff --git a/drivers/iommu/io-pgtable.c b/drivers/iommu/io-pgtable.c index f4caf630638a..e273c18ae22b 100644 --- a/drivers/iommu/io-pgtable.c +++ b/drivers/iommu/io-pgtable.c @@ -47,6 +47,18 @@ static int check_custom_allocator(enum io_pgtable_fmt fmt, if (!cfg->alloc) return 0; + switch (fmt) { + case ARM_32_LPAE_S1: + case ARM_32_LPAE_S2: + case ARM_64_LPAE_S1: + case ARM_64_LPAE_S2: + case ARM_MALI_LPAE: + return 0; + + default: + break; + } + return -EINVAL; }
We need that in order to implement the VM_BIND ioctl in the GPU driver targeting new Mali GPUs. VM_BIND is about executing MMU map/unmap requests asynchronously, possibly after waiting for external dependencies encoded as dma_fences. We intend to use the drm_sched framework to automate the dependency tracking and VM job dequeuing logic, but this comes with its own set of constraints, one of them being the fact we are not allowed to allocate memory in the drm_gpu_scheduler_ops::run_job() to avoid this sort of deadlocks: - VM_BIND map job needs to allocate a page table to map some memory to the VM. No memory available, so kswapd is kicked - GPU driver shrinker backend ends up waiting on the fence attached to the VM map job or any other job fence depending on this VM operation. With custom allocators, we will be able to pre-reserve enough pages to guarantee the map/unmap operations we queued will take place without going through the system allocator. But we can also optimize allocation/reservation by not free-ing pages immediately, so any upcoming page table allocation requests can be serviced by some free page table pool kept at the driver level. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> --- drivers/iommu/io-pgtable-arm.c | 50 +++++++++++++++++++++++----------- drivers/iommu/io-pgtable.c | 12 ++++++++ 2 files changed, 46 insertions(+), 16 deletions(-)