Message ID | 1410460244-18943-3-git-send-email-mitchelh@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Mitch, On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote: > Currently, we provide the iommu_ops.iova_to_phys service by doing a > table walk in software to translate IO virtual addresses to physical > addresses. On SMMUs that support it, it can be useful to ask the SMMU > itself to do the translation. This can be used to warm the TLBs for an > SMMU. It can also be useful for testing and hardware validation. > > Since the address translation registers are optional on SMMUv2, only > enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1 > and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec. [...] > +static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain, > + dma_addr_t iova) > +{ > + struct arm_smmu_domain *smmu_domain = domain->priv; > + struct arm_smmu_device *smmu = smmu_domain->smmu; > + struct arm_smmu_cfg *cfg = &smmu_domain->cfg; > + struct device *dev = smmu->dev; > + void __iomem *cb_base; > + u32 tmp; > + u64 phys; > + > + cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx); > + > + if (smmu->version == 1) { > + u32 reg = iova & ~0xFFF; > + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO); > + } else { > + u32 reg = iova & ~0xFFF; > + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO); > + reg = (iova & ~0xFFF) >> 32; > + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI); > + } > + > + if (readl_poll_timeout(cb_base + ARM_SMMU_CB_ATSR, tmp, > + !(tmp & ATSR_ACTIVE), 10, ATSR_LOOP_TIMEOUT)) { > + dev_err(dev, > + "iova to phys timed out on 0x%pa for %s. Falling back to software table walk.\n", > + &iova, dev_name(dev)); dev_err already prints the device name. > + return arm_smmu_iova_to_phys_soft(domain, iova); > + } > + > + phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO); > + phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32; > + > + if (phys & CB_PAR_F) { > + dev_err(dev, "translation fault on %s!\n", dev_name(dev)); > + dev_err(dev, "PAR = 0x%llx\n", phys); > + } > + phys = (phys & 0xFFFFFFF000ULL) | (iova & 0x00000FFF); How does this work for 64k pages? Will
On Mon, Sep 22 2014 at 08:26:14 AM, Will Deacon <will.deacon@arm.com> wrote: > Hi Mitch, > > On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote: >> Currently, we provide the iommu_ops.iova_to_phys service by doing a >> table walk in software to translate IO virtual addresses to physical >> addresses. On SMMUs that support it, it can be useful to ask the SMMU >> itself to do the translation. This can be used to warm the TLBs for an >> SMMU. It can also be useful for testing and hardware validation. >> >> Since the address translation registers are optional on SMMUv2, only >> enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1 >> and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec. > > [...] > >> +static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain, >> + dma_addr_t iova) >> +{ >> + struct arm_smmu_domain *smmu_domain = domain->priv; >> + struct arm_smmu_device *smmu = smmu_domain->smmu; >> + struct arm_smmu_cfg *cfg = &smmu_domain->cfg; >> + struct device *dev = smmu->dev; >> + void __iomem *cb_base; >> + u32 tmp; >> + u64 phys; >> + >> + cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx); >> + >> + if (smmu->version == 1) { >> + u32 reg = iova & ~0xFFF; >> + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO); >> + } else { >> + u32 reg = iova & ~0xFFF; >> + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO); >> + reg = (iova & ~0xFFF) >> 32; >> + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI); >> + } >> + >> + if (readl_poll_timeout(cb_base + ARM_SMMU_CB_ATSR, tmp, >> + !(tmp & ATSR_ACTIVE), 10, ATSR_LOOP_TIMEOUT)) { >> + dev_err(dev, >> + "iova to phys timed out on 0x%pa for %s. Falling back to software table walk.\n", >> + &iova, dev_name(dev)); > > dev_err already prints the device name. Ah of course. I'll remove the dev_name. > >> + return arm_smmu_iova_to_phys_soft(domain, iova); >> + } >> + >> + phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO); >> + phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32; >> + >> + if (phys & CB_PAR_F) { >> + dev_err(dev, "translation fault on %s!\n", dev_name(dev)); >> + dev_err(dev, "PAR = 0x%llx\n", phys); >> + } >> + phys = (phys & 0xFFFFFFF000ULL) | (iova & 0x00000FFF); > > How does this work for 64k pages? So at the moment we're always assuming that we're using v7/v8 long descriptor format, right? All I see in the spec (14.5.15 SMMU_CBn_PAR) is that bits[47:12]=>PA[47:12]... Or am I missing something completely? As a mental note, if we add support for v7 short descriptors (which we would like to do sometime soon) then we'll have to handle the supersection case here as well. -Mitch
On Wed, Sep 24, 2014 at 02:12:00AM +0100, Mitchel Humpherys wrote: > On Mon, Sep 22 2014 at 08:26:14 AM, Will Deacon <will.deacon@arm.com> wrote: > > On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote: > >> + return arm_smmu_iova_to_phys_soft(domain, iova); > >> + } > >> + > >> + phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO); > >> + phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32; > >> + > >> + if (phys & CB_PAR_F) { > >> + dev_err(dev, "translation fault on %s!\n", dev_name(dev)); > >> + dev_err(dev, "PAR = 0x%llx\n", phys); > >> + } > >> + phys = (phys & 0xFFFFFFF000ULL) | (iova & 0x00000FFF); > > > > How does this work for 64k pages? > > So at the moment we're always assuming that we're using v7/v8 long > descriptor format, right? All I see in the spec (14.5.15 SMMU_CBn_PAR) > is that bits[47:12]=>PA[47:12]... Or am I missing something completely? I think you've got 64k pages confused with the short-descriptor format. When we use 64k pages with long descriptors, you're masked off bits 15-12 of the iova above, so you'll have a hole in the physical address afaict. Will
On Wed, Sep 24 2014 at 09:37:12 AM, Will Deacon <will.deacon@arm.com> wrote: > On Wed, Sep 24, 2014 at 02:12:00AM +0100, Mitchel Humpherys wrote: >> On Mon, Sep 22 2014 at 08:26:14 AM, Will Deacon <will.deacon@arm.com> wrote: >> > On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote: >> >> + return arm_smmu_iova_to_phys_soft(domain, iova); >> >> + } >> >> + >> >> + phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO); >> >> + phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32; >> >> + >> >> + if (phys & CB_PAR_F) { >> >> + dev_err(dev, "translation fault on %s!\n", dev_name(dev)); >> >> + dev_err(dev, "PAR = 0x%llx\n", phys); >> >> + } >> >> + phys = (phys & 0xFFFFFFF000ULL) | (iova & 0x00000FFF); >> > >> > How does this work for 64k pages? >> >> So at the moment we're always assuming that we're using v7/v8 long >> descriptor format, right? All I see in the spec (14.5.15 SMMU_CBn_PAR) >> is that bits[47:12]=>PA[47:12]... Or am I missing something completely? > > I think you've got 64k pages confused with the short-descriptor format. > > When we use 64k pages with long descriptors, you're masked off bits 15-12 of > the iova above, so you'll have a hole in the physical address afaict. Even with long descriptors the spec says bits 15-12 should come from CB_PAR... It makes no mention of reinterpreting those bits depending on the programmed page granule. The only thing I can conclude from the spec is that hardware should be smart enough to do the right thing with bits 15-12 when the page granule is 64k. Although even if hardware is smart enough I guess CB_PAR[15:12] should be the same as iova[15:12] for the 64k case? -Mitch
On Wed, Sep 24, 2014 at 09:34:26PM +0100, Mitchel Humpherys wrote: > On Wed, Sep 24 2014 at 09:37:12 AM, Will Deacon <will.deacon@arm.com> wrote: > > On Wed, Sep 24, 2014 at 02:12:00AM +0100, Mitchel Humpherys wrote: > >> On Mon, Sep 22 2014 at 08:26:14 AM, Will Deacon <will.deacon@arm.com> wrote: > >> > On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote: > >> >> + return arm_smmu_iova_to_phys_soft(domain, iova); > >> >> + } > >> >> + > >> >> + phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO); > >> >> + phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32; > >> >> + > >> >> + if (phys & CB_PAR_F) { > >> >> + dev_err(dev, "translation fault on %s!\n", dev_name(dev)); > >> >> + dev_err(dev, "PAR = 0x%llx\n", phys); > >> >> + } > >> >> + phys = (phys & 0xFFFFFFF000ULL) | (iova & 0x00000FFF); > >> > > >> > How does this work for 64k pages? > >> > >> So at the moment we're always assuming that we're using v7/v8 long > >> descriptor format, right? All I see in the spec (14.5.15 SMMU_CBn_PAR) > >> is that bits[47:12]=>PA[47:12]... Or am I missing something completely? > > > > I think you've got 64k pages confused with the short-descriptor format. > > > > When we use 64k pages with long descriptors, you're masked off bits 15-12 of > > the iova above, so you'll have a hole in the physical address afaict. > > Even with long descriptors the spec says bits 15-12 should come from > CB_PAR... It makes no mention of reinterpreting those bits depending on > the programmed page granule. The only thing I can conclude from the > spec is that hardware should be smart enough to do the right thing with > bits 15-12 when the page granule is 64k. Although even if hardware is > smart enough I guess CB_PAR[15:12] should be the same as iova[15:12] for > the 64k case? Yeah, fair enough, the code you have should work correctly then. Unfortunately, I don't have any suitable hardware on which to test it. Will
On Fri, Sep 26 2014 at 03:24:30 AM, Will Deacon <will.deacon@arm.com> wrote: > On Wed, Sep 24, 2014 at 09:34:26PM +0100, Mitchel Humpherys wrote: >> On Wed, Sep 24 2014 at 09:37:12 AM, Will Deacon <will.deacon@arm.com> wrote: >> > On Wed, Sep 24, 2014 at 02:12:00AM +0100, Mitchel Humpherys wrote: >> >> On Mon, Sep 22 2014 at 08:26:14 AM, Will Deacon <will.deacon@arm.com> wrote: >> >> > On Thu, Sep 11, 2014 at 07:30:44PM +0100, Mitchel Humpherys wrote: >> >> >> + return arm_smmu_iova_to_phys_soft(domain, iova); >> >> >> + } >> >> >> + >> >> >> + phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO); >> >> >> + phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32; >> >> >> + >> >> >> + if (phys & CB_PAR_F) { >> >> >> + dev_err(dev, "translation fault on %s!\n", dev_name(dev)); >> >> >> + dev_err(dev, "PAR = 0x%llx\n", phys); >> >> >> + } >> >> >> + phys = (phys & 0xFFFFFFF000ULL) | (iova & 0x00000FFF); >> >> > >> >> > How does this work for 64k pages? >> >> >> >> So at the moment we're always assuming that we're using v7/v8 long >> >> descriptor format, right? All I see in the spec (14.5.15 SMMU_CBn_PAR) >> >> is that bits[47:12]=>PA[47:12]... Or am I missing something completely? >> > >> > I think you've got 64k pages confused with the short-descriptor format. >> > >> > When we use 64k pages with long descriptors, you're masked off bits 15-12 of >> > the iova above, so you'll have a hole in the physical address afaict. >> >> Even with long descriptors the spec says bits 15-12 should come from >> CB_PAR... It makes no mention of reinterpreting those bits depending on >> the programmed page granule. The only thing I can conclude from the >> spec is that hardware should be smart enough to do the right thing with >> bits 15-12 when the page granule is 64k. Although even if hardware is >> smart enough I guess CB_PAR[15:12] should be the same as iova[15:12] for >> the 64k case? > > Yeah, fair enough, the code you have should work correctly then. > Unfortunately, I don't have any suitable hardware on which to test it. FWIW, I have tested this on a few platforms here. I'll send out a v2 for the series then with the changes you suggested on the iopoll patch. -Mitch
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index ff6633d3c9..a6ead91214 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -36,6 +36,7 @@ #include <linux/interrupt.h> #include <linux/io.h> #include <linux/iommu.h> +#include <linux/iopoll.h> #include <linux/mm.h> #include <linux/module.h> #include <linux/of.h> @@ -140,6 +141,7 @@ #define ID0_S2TS (1 << 29) #define ID0_NTS (1 << 28) #define ID0_SMS (1 << 27) +#define ID0_ATOSNS (1 << 26) #define ID0_PTFS_SHIFT 24 #define ID0_PTFS_MASK 0x2 #define ID0_PTFS_V8_ONLY 0x2 @@ -231,11 +233,17 @@ #define ARM_SMMU_CB_TTBR0_HI 0x24 #define ARM_SMMU_CB_TTBCR 0x30 #define ARM_SMMU_CB_S1_MAIR0 0x38 +#define ARM_SMMU_CB_PAR_LO 0x50 +#define ARM_SMMU_CB_PAR_HI 0x54 #define ARM_SMMU_CB_FSR 0x58 #define ARM_SMMU_CB_FAR_LO 0x60 #define ARM_SMMU_CB_FAR_HI 0x64 #define ARM_SMMU_CB_FSYNR0 0x68 #define ARM_SMMU_CB_S1_TLBIASID 0x610 +#define ARM_SMMU_CB_ATS1PR_LO 0x800 +#define ARM_SMMU_CB_ATS1PR_HI 0x804 +#define ARM_SMMU_CB_ATSR 0x8f0 +#define ATSR_LOOP_TIMEOUT 1000000 /* 1s! */ #define SCTLR_S1_ASIDPNE (1 << 12) #define SCTLR_CFCFG (1 << 7) @@ -247,6 +255,10 @@ #define SCTLR_M (1 << 0) #define SCTLR_EAE_SBOP (SCTLR_AFE | SCTLR_TRE) +#define CB_PAR_F (1 << 0) + +#define ATSR_ACTIVE (1 << 0) + #define RESUME_RETRY (0 << 0) #define RESUME_TERMINATE (1 << 0) @@ -354,6 +366,7 @@ struct arm_smmu_device { #define ARM_SMMU_FEAT_TRANS_S1 (1 << 2) #define ARM_SMMU_FEAT_TRANS_S2 (1 << 3) #define ARM_SMMU_FEAT_TRANS_NESTED (1 << 4) +#define ARM_SMMU_FEAT_TRANS_OPS (1 << 5) u32 features; #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0) @@ -1485,7 +1498,7 @@ static size_t arm_smmu_unmap(struct iommu_domain *domain, unsigned long iova, return ret ? 0 : size; } -static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain, +static phys_addr_t arm_smmu_iova_to_phys_soft(struct iommu_domain *domain, dma_addr_t iova) { pgd_t *pgdp, pgd; @@ -1518,6 +1531,59 @@ static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain, return __pfn_to_phys(pte_pfn(pte)) | (iova & ~PAGE_MASK); } +static phys_addr_t arm_smmu_iova_to_phys_hard(struct iommu_domain *domain, + dma_addr_t iova) +{ + struct arm_smmu_domain *smmu_domain = domain->priv; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_cfg *cfg = &smmu_domain->cfg; + struct device *dev = smmu->dev; + void __iomem *cb_base; + u32 tmp; + u64 phys; + + cb_base = ARM_SMMU_CB_BASE(smmu) + ARM_SMMU_CB(smmu, cfg->cbndx); + + if (smmu->version == 1) { + u32 reg = iova & ~0xFFF; + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO); + } else { + u32 reg = iova & ~0xFFF; + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_LO); + reg = (iova & ~0xFFF) >> 32; + writel_relaxed(reg, cb_base + ARM_SMMU_CB_ATS1PR_HI); + } + + if (readl_poll_timeout(cb_base + ARM_SMMU_CB_ATSR, tmp, + !(tmp & ATSR_ACTIVE), 10, ATSR_LOOP_TIMEOUT)) { + dev_err(dev, + "iova to phys timed out on 0x%pa for %s. Falling back to software table walk.\n", + &iova, dev_name(dev)); + return arm_smmu_iova_to_phys_soft(domain, iova); + } + + phys = readl_relaxed(cb_base + ARM_SMMU_CB_PAR_LO); + phys |= ((u64) readl_relaxed(cb_base + ARM_SMMU_CB_PAR_HI)) << 32; + + if (phys & CB_PAR_F) { + dev_err(dev, "translation fault on %s!\n", dev_name(dev)); + dev_err(dev, "PAR = 0x%llx\n", phys); + } + phys = (phys & 0xFFFFFFF000ULL) | (iova & 0x00000FFF); + + return phys; +} + +static phys_addr_t arm_smmu_iova_to_phys(struct iommu_domain *domain, + dma_addr_t iova) +{ + struct arm_smmu_domain *smmu_domain = domain->priv; + + if (smmu_domain->smmu->features & ARM_SMMU_FEAT_TRANS_OPS) + return arm_smmu_iova_to_phys_hard(domain, iova); + return arm_smmu_iova_to_phys_soft(domain, iova); +} + static int arm_smmu_domain_has_cap(struct iommu_domain *domain, unsigned long cap) { @@ -1730,6 +1796,11 @@ static int arm_smmu_device_cfg_probe(struct arm_smmu_device *smmu) return -ENODEV; } + if (smmu->version == 1 || (!(id & ID0_ATOSNS) && (id & ID0_S1TS))) { + smmu->features |= ARM_SMMU_FEAT_TRANS_OPS; + dev_notice(smmu->dev, "\taddress translation ops\n"); + } + if (id & ID0_CTTW) { smmu->features |= ARM_SMMU_FEAT_COHERENT_WALK; dev_notice(smmu->dev, "\tcoherent table walk\n");
Currently, we provide the iommu_ops.iova_to_phys service by doing a table walk in software to translate IO virtual addresses to physical addresses. On SMMUs that support it, it can be useful to ask the SMMU itself to do the translation. This can be used to warm the TLBs for an SMMU. It can also be useful for testing and hardware validation. Since the address translation registers are optional on SMMUv2, only enable hardware translations when using SMMUv1 or when SMMU_IDR0.S1TS=1 and SMMU_IDR0.ATOSNS=0, as described in the ARM SMMU v1-v2 spec. Signed-off-by: Mitchel Humpherys <mitchelh@codeaurora.org> --- drivers/iommu/arm-smmu.c | 73 +++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 72 insertions(+), 1 deletion(-)