Message ID | 55684F1C.3050702@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, May 29, 2015 at 12:35:56PM +0100, Robin Murphy wrote: > The trouble with this is, what about the CPU page size? Say you have > some multimedia subsystem with its own integrated SMMU and for that > they've only implemented the 16K granule scheme because it works > best for the video hardware (and the GPU driver is making direct > IOMMU API calls to remap carved-out RAM rather than using > DMA-mapping). Now, the SMMU on the compute side of the SoC serving > the general peripherals will be rendered useless by bumping the > system-wide minimum page size up to 16K, because it then can't map > that scatterlist of discontiguous 4K pages that the USB controller > needs... > > I think this really represents another push to get away from (or at > least around) the page-at-a-time paradigm - if the IOMMU API itself > wasn't too fussed about page sizes and could let drivers handle the > full map/unmap requests however they see fit, I think we could > bypass a lot of these issues. We've already got the Intel IOMMU > driver doing horrible hacks with the pgsize_bitmap to cheat the > system, I'm sure we don't want to add any more of that. How about > something like the below diff as a first step? Moving functionality out of the iommu core code into the drivers is the wrong direction imo. It is better to solve it with something like struct iommu_domain *iommu_domain_alloc_for_group(struct iommu_group *group); Which gets us a domain that can only be assigned to that particular group. Since there is a clear one-to-many relationship between a hardware iommu and the groups of devices behind it, we could propagate the pgsize_bitmap from the iommu to the group and then to the domain. Domains allocated via iommu_domain_alloc() would get the merged pgsize_bitmap like I described in my previous mail. But to make this happen we need a representation of single hardware iommu instances in the iommu core first. Joerg
Hi Joerg, On Fri, May 29, 2015 at 03:40:43PM +0100, Joerg Roedel wrote: > On Fri, May 29, 2015 at 12:35:56PM +0100, Robin Murphy wrote: > > The trouble with this is, what about the CPU page size? Say you have > > some multimedia subsystem with its own integrated SMMU and for that > > they've only implemented the 16K granule scheme because it works > > best for the video hardware (and the GPU driver is making direct > > IOMMU API calls to remap carved-out RAM rather than using > > DMA-mapping). Now, the SMMU on the compute side of the SoC serving > > the general peripherals will be rendered useless by bumping the > > system-wide minimum page size up to 16K, because it then can't map > > that scatterlist of discontiguous 4K pages that the USB controller > > needs... > > > > I think this really represents another push to get away from (or at > > least around) the page-at-a-time paradigm - if the IOMMU API itself > > wasn't too fussed about page sizes and could let drivers handle the > > full map/unmap requests however they see fit, I think we could > > bypass a lot of these issues. We've already got the Intel IOMMU > > driver doing horrible hacks with the pgsize_bitmap to cheat the > > system, I'm sure we don't want to add any more of that. How about > > something like the below diff as a first step? > > Moving functionality out of the iommu core code into the drivers is the > wrong direction imo. It is better to solve it with something like > > struct iommu_domain *iommu_domain_alloc_for_group(struct iommu_group *group); > > Which gets us a domain that can only be assigned to that particular > group. Since there is a clear one-to-many relationship between a > hardware iommu and the groups of devices behind it, we could propagate > the pgsize_bitmap from the iommu to the group and then to the domain. > > Domains allocated via iommu_domain_alloc() would get the merged > pgsize_bitmap like I described in my previous mail. > > But to make this happen we need a representation of single hardware > iommu instances in the iommu core first. I like this proposal. The only remaining part is that the pgsize bitmap inherited by the domain can only truly be finalised once the page table is allocated, which in turn can only happen once we've identified the IOMMU instance. In the case of iommu_domain_alloc_for_group, that's nice and straightforward (i.e. as part of the call) but for the default iommu_domain_alloc() case, we'd have to continue postponing things to ->attach time before we could provide a reliable pgsize_bitmap. This is to handle the case where an SMMU may be able to support only a subset of the pgsize_bitmap on a per-domain basis. In which situation do you think the merged pgsize_bitmap would get used? The only place I can think of is if we're trying to call iommu_map on a domain with no devices attached. However, if that domain was created using iommu_domain_alloc() then the pgsize_bitmap is the least of our worries -- we don't even know the page table format! Will
Hi Will, On Mon, Jun 01, 2015 at 10:40:14AM +0100, Will Deacon wrote: > I like this proposal. The only remaining part is that the pgsize bitmap > inherited by the domain can only truly be finalised once the page table > is allocated, which in turn can only happen once we've identified the > IOMMU instance. We know the IOMMU instance for domains allocted with iommu_domain_alloc_for_group because every group is behind a single IOMMU instance. So the page-table can be allocated at domain creation time and mappings can be created before the group itself is attached. > In the case of iommu_domain_alloc_for_group, that's nice and > straightforward (i.e. as part of the call) but for the default > iommu_domain_alloc() case, we'd have to continue postponing things to > ->attach time before we could provide a reliable pgsize_bitmap. This is > to handle the case where an SMMU may be able to support only a subset of > the pgsize_bitmap on a per-domain basis. I don't think we need to postpone anything. Domains returned by iommu_domain_alloc() need to work for all groups. If there are multiple IOMMUs in the system with different capabilities/page-table formats the IOMMU core needs to build multiple page-tables for that domain. VFIO already does that as a workaround of how things are done currently, but I really think this should be done in the IOMMU core code. > In which situation do you think the merged pgsize_bitmap would get used? > The only place I can think of is if we're trying to call iommu_map on a > domain with no devices attached. However, if that domain was created > using iommu_domain_alloc() then the pgsize_bitmap is the least of our > worries -- we don't even know the page table format! The first usecase for the merged pgsize_bitmap that comes to mind is to determine the minimum page-size that can be used for a domain. In the SMMUv3 case when there are IOMMUs with different minium page-sizes in one system, this would be the biggest minimum page-size of all IOMMUs. Probably there are other uses for that bitmap that show up at implementation time. Joerg
On Tue, Jun 02, 2015 at 08:39:56AM +0100, Joerg Roedel wrote: > On Mon, Jun 01, 2015 at 10:40:14AM +0100, Will Deacon wrote: > > I like this proposal. The only remaining part is that the pgsize bitmap > > inherited by the domain can only truly be finalised once the page table > > is allocated, which in turn can only happen once we've identified the > > IOMMU instance. > > We know the IOMMU instance for domains allocted with > iommu_domain_alloc_for_group because every group is behind a single > IOMMU instance. So the page-table can be allocated at domain creation > time and mappings can be created before the group itself is attached. Indeed -- I think this form of allocation makes a lot of sense. > > In the case of iommu_domain_alloc_for_group, that's nice and > > straightforward (i.e. as part of the call) but for the default > > iommu_domain_alloc() case, we'd have to continue postponing things to > > ->attach time before we could provide a reliable pgsize_bitmap. This is > > to handle the case where an SMMU may be able to support only a subset of > > the pgsize_bitmap on a per-domain basis. > > I don't think we need to postpone anything. Domains returned by > iommu_domain_alloc() need to work for all groups. If there are multiple > IOMMUs in the system with different capabilities/page-table formats the > IOMMU core needs to build multiple page-tables for that domain. I really don't think that's feasible. We support an awful lot of combinations that affect the page table format on ARM: there are 5 different formats, each supporting different translation regimes (sets of page size) and each of those needs configuring for input/output address sizes to compute the number of levels. You could easily end up with over 20 page tables and their corresponding sets of control register values. Given that we only install one page table in the end, I'm struggling to see the issue with postponing its allocation until we've figured out the IOMMU instance thanks to a device attach. > VFIO already does that as a workaround of how things are done currently, > but I really think this should be done in the IOMMU core code. Postponing page table creation to ->attach works fine with VFIO today. AFAICT, it never tries to map pages for an empty domain and therefore the IOMMU instance is always known by the IOMMU driver at ->map time. > > In which situation do you think the merged pgsize_bitmap would get used? > > The only place I can think of is if we're trying to call iommu_map on a > > domain with no devices attached. However, if that domain was created > > using iommu_domain_alloc() then the pgsize_bitmap is the least of our > > worries -- we don't even know the page table format! > > The first usecase for the merged pgsize_bitmap that comes to mind is to > determine the minimum page-size that can be used for a domain. In the > SMMUv3 case when there are IOMMUs with different minium page-sizes in one > system, this would be the biggest minimum page-size of all IOMMUs. Understood, but I'm trying to understand why you'd want to know that for an empty (no devices attached) domain. I don't think we currently have any use-cases for that and it's really not a practical thing to support. Will
On Tue, Jun 02, 2015 at 10:47:46AM +0100, Will Deacon wrote: > On Tue, Jun 02, 2015 at 08:39:56AM +0100, Joerg Roedel wrote: > > I don't think we need to postpone anything. Domains returned by > > iommu_domain_alloc() need to work for all groups. If there are multiple > > IOMMUs in the system with different capabilities/page-table formats the > > IOMMU core needs to build multiple page-tables for that domain. > > I really don't think that's feasible. We support an awful lot of > combinations that affect the page table format on ARM: there are 5 > different formats, each supporting different translation regimes (sets > of page size) and each of those needs configuring for input/output > address sizes to compute the number of levels. You could easily end up > with over 20 page tables and their corresponding sets of control > register values. We only need to build page-tables for IOMMUs that are actually in the system. And this also only for domains that are not tied to a particular group. If some hardware vendor is crazy enough to put 20 IOMMUs in the system with each having its own page-table format, then be it so. But in reality I think we will have only a handful of different page-tables. > Given that we only install one page table in the end, I'm struggling to > see the issue with postponing its allocation until we've figured out > the IOMMU instance thanks to a device attach. It is a question of the use-case for the domain. A device driver (like for USB or graphics) would allocate the domain with the _for_group function and end up with only one page-table. But if VFIO comes around and wants to attach devices to a KVM guest it would use iommu_domain_alloc() and has then all page-tables it possibly needs (for devices attached at start and even devices that might later be hotplugged into the guest). Joerg
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c index e43d489..13e94ae 100644 --- a/drivers/iommu/amd_iommu.c +++ b/drivers/iommu/amd_iommu.c @@ -3418,8 +3418,8 @@ static const struct iommu_ops amd_iommu_ops = { .domain_free = amd_iommu_domain_free, .attach_dev = amd_iommu_attach_device, .detach_dev = amd_iommu_detach_device, - .map = amd_iommu_map, - .unmap = amd_iommu_unmap, + .map_page = amd_iommu_map, + .unmap_page = amd_iommu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = amd_iommu_iova_to_phys, .pgsize_bitmap = AMD_IOMMU_PGSIZES, diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 66a803b..427ae40 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1447,8 +1447,8 @@ static struct iommu_ops arm_smmu_ops = { .domain_free = arm_smmu_domain_free, .attach_dev = arm_smmu_attach_dev, .detach_dev = arm_smmu_detach_dev, - .map = arm_smmu_map, - .unmap = arm_smmu_unmap, + .map_page = arm_smmu_map, + .unmap_page = arm_smmu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = arm_smmu_iova_to_phys, .add_device = arm_smmu_add_device, diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 3e89850..9edf14a 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -1182,8 +1182,8 @@ static const struct iommu_ops exynos_iommu_ops = { .domain_free = exynos_iommu_domain_free, .attach_dev = exynos_iommu_attach_device, .detach_dev = exynos_iommu_detach_device, - .map = exynos_iommu_map, - .unmap = exynos_iommu_unmap, + .map_page = exynos_iommu_map, + .unmap_page = exynos_iommu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = exynos_iommu_iova_to_phys, .add_device = exynos_iommu_add_device, diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 68d43be..67de73d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -4597,8 +4597,8 @@ static const struct iommu_ops intel_iommu_ops = { .domain_free = intel_iommu_domain_free, .attach_dev = intel_iommu_attach_device, .detach_dev = intel_iommu_detach_device, - .map = intel_iommu_map, - .unmap = intel_iommu_unmap, + .map_page = intel_iommu_map, + .unmap_page = intel_iommu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = intel_iommu_iova_to_phys, .add_device = intel_iommu_add_device, diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index d4f527e..7312623 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -1033,7 +1033,10 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, size_t orig_size = size; int ret = 0; - if (unlikely(domain->ops->map == NULL || + if (domain->ops->map) + return domain->ops->map(domain, iova, paddr, size, prot); + + if (unlikely(domain->ops->map_page == NULL || domain->ops->pgsize_bitmap == 0UL)) return -ENODEV; @@ -1062,7 +1065,7 @@ int iommu_map(struct iommu_domain *domain, unsigned long iova, pr_debug("mapping: iova 0x%lx pa %pa pgsize 0x%zx\n", iova, &paddr, pgsize); - ret = domain->ops->map(domain, iova, paddr, pgsize, prot); + ret = domain->ops->map_page(domain, iova, paddr, pgsize, prot); if (ret) break; @@ -1087,7 +1090,10 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) unsigned int min_pagesz; unsigned long orig_iova = iova; - if (unlikely(domain->ops->unmap == NULL || + if (domain->ops->unmap) + return domain->ops->unmap(domain, iova, size); + + if (unlikely(domain->ops->unmap_page == NULL || domain->ops->pgsize_bitmap == 0UL)) return -ENODEV; @@ -1117,7 +1123,7 @@ size_t iommu_unmap(struct iommu_domain *domain, unsigned long iova, size_t size) while (unmapped < size) { size_t pgsize = iommu_pgsize(domain, iova, size - unmapped); - unmapped_page = domain->ops->unmap(domain, iova, pgsize); + unmapped_page = domain->ops->unmap_page(domain, iova, pgsize); if (!unmapped_page) break; diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index 1a67c53..1db3486 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -746,8 +746,8 @@ static const struct iommu_ops ipmmu_ops = { .domain_free = ipmmu_domain_free, .attach_dev = ipmmu_attach_device, .detach_dev = ipmmu_detach_device, - .map = ipmmu_map, - .unmap = ipmmu_unmap, + .map_page = ipmmu_map, + .unmap_page = ipmmu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = ipmmu_iova_to_phys, .add_device = ipmmu_add_device, diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c index 15a2063..ea40c86 100644 --- a/drivers/iommu/msm_iommu.c +++ b/drivers/iommu/msm_iommu.c @@ -677,8 +677,8 @@ static const struct iommu_ops msm_iommu_ops = { .domain_free = msm_iommu_domain_free, .attach_dev = msm_iommu_attach_dev, .detach_dev = msm_iommu_detach_dev, - .map = msm_iommu_map, - .unmap = msm_iommu_unmap, + .map_page = msm_iommu_map, + .unmap_page = msm_iommu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = msm_iommu_iova_to_phys, .pgsize_bitmap = MSM_IOMMU_PGSIZES, diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index a22c33d..784d29c 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1371,8 +1371,8 @@ static const struct iommu_ops omap_iommu_ops = { .domain_free = omap_iommu_domain_free, .attach_dev = omap_iommu_attach_dev, .detach_dev = omap_iommu_detach_dev, - .map = omap_iommu_map, - .unmap = omap_iommu_unmap, + .map_page = omap_iommu_map, + .unmap_page = omap_iommu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = omap_iommu_iova_to_phys, .add_device = omap_iommu_add_device, diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index cab2145..b10d5b2 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -964,8 +964,8 @@ static const struct iommu_ops rk_iommu_ops = { .domain_free = rk_iommu_domain_free, .attach_dev = rk_iommu_attach_device, .detach_dev = rk_iommu_detach_device, - .map = rk_iommu_map, - .unmap = rk_iommu_unmap, + .map_page = rk_iommu_map, + .unmap_page = rk_iommu_unmap, .add_device = rk_iommu_add_device, .remove_device = rk_iommu_remove_device, .iova_to_phys = rk_iommu_iova_to_phys, diff --git a/drivers/iommu/shmobile-iommu.c b/drivers/iommu/shmobile-iommu.c index a028751..f587313 100644 --- a/drivers/iommu/shmobile-iommu.c +++ b/drivers/iommu/shmobile-iommu.c @@ -366,8 +366,8 @@ static const struct iommu_ops shmobile_iommu_ops = { .domain_free = shmobile_iommu_domain_free, .attach_dev = shmobile_iommu_attach_device, .detach_dev = shmobile_iommu_detach_device, - .map = shmobile_iommu_map, - .unmap = shmobile_iommu_unmap, + .map_page = shmobile_iommu_map, + .unmap_page = shmobile_iommu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = shmobile_iommu_iova_to_phys, .add_device = shmobile_iommu_add_device, diff --git a/drivers/iommu/tegra-gart.c b/drivers/iommu/tegra-gart.c index 37e708f..8e4d6f1 100644 --- a/drivers/iommu/tegra-gart.c +++ b/drivers/iommu/tegra-gart.c @@ -340,9 +340,9 @@ static const struct iommu_ops gart_iommu_ops = { .domain_free = gart_iommu_domain_free, .attach_dev = gart_iommu_attach_dev, .detach_dev = gart_iommu_detach_dev, - .map = gart_iommu_map, + .map_page = gart_iommu_map, .map_sg = default_iommu_map_sg, - .unmap = gart_iommu_unmap, + .unmap_page = gart_iommu_unmap, .iova_to_phys = gart_iommu_iova_to_phys, .pgsize_bitmap = GART_IOMMU_PGSIZES, }; diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c index c845d99..38915fc 100644 --- a/drivers/iommu/tegra-smmu.c +++ b/drivers/iommu/tegra-smmu.c @@ -650,8 +650,8 @@ static const struct iommu_ops tegra_smmu_ops = { .detach_dev = tegra_smmu_detach_dev, .add_device = tegra_smmu_add_device, .remove_device = tegra_smmu_remove_device, - .map = tegra_smmu_map, - .unmap = tegra_smmu_unmap, + .map_page = tegra_smmu_map, + .unmap_page = tegra_smmu_unmap, .map_sg = default_iommu_map_sg, .iova_to_phys = tegra_smmu_iova_to_phys, diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 0546b87..5857bd6 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -124,6 +124,8 @@ enum iommu_attr { * @detach_dev: detach device from an iommu domain * @map: map a physically contiguous memory region to an iommu domain * @unmap: unmap a physically contiguous memory region from an iommu domain + * @map_page: map a physical region corresponding to a single iommu page + * @unmap_page: unmap a physical region corresponding to a single iommu page * @map_sg: map a scatter-gather list of physically contiguous memory