diff mbox

[3/5] iommu/arm-smmu: Invalidate TLBs properly

Message ID 5665850F.1060406@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Robin Murphy Dec. 7, 2015, 1:09 p.m. UTC
On 07/12/15 11:09, Will Deacon wrote:
> On Fri, Dec 04, 2015 at 05:53:00PM +0000, Robin Murphy wrote:
>> When invalidating an IOVA range potentially spanning multiple pages,
>> such as when removing an entire intermediate-level table, we currently
>> only issue an invalidation for the first IOVA of that range. Since the
>> architecture specifies that address-based TLB maintenance operations
>> target a single entry, an SMMU could feasibly retain live entries for
>> subsequent pages within that unmapped range, which is not good.
>>
>> Make sure we hit every possible entry by iterating over the whole range
>> at the granularity provided by the pagetable implementation.
>>
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/iommu/arm-smmu.c | 19 ++++++++++++++++---
>>   1 file changed, 16 insertions(+), 3 deletions(-)
>
> Can you do something similar for arm-smmu-v3.c as well, please?

Something like this? (untested as I don't have a v3 model set up):

------>8------
From: Robin Murphy <robin.murphy@arm.com>
Date: Mon, 7 Dec 2015 12:52:56 +0000
Subject: [PATCH] iommu/arm-smmu: Fix TLB invalidation

SMMUv3 operates under the same rules as SMMUv2 and the CPU
architectures, so when invalidating an IOVA range we have to hit
every address for which a TLB entry might exist.

To fix this, issue commands for the whole range rather than just the
initial address; as a minor optimisation, try to avoid flooding the
queue by falling back to 'invalidate all' if the range is large.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
  drivers/iommu/arm-smmu-v3.c | 9 ++++++++-
  1 file changed, 8 insertions(+), 1 deletion(-)

  		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
@@ -1354,7 +1358,10 @@ static void 
arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
  		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
  	}

-	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+	do {
+		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
+		cmd.tlbi.addr += granule;
+	} while (size -= granule);
  }

  static struct iommu_gather_ops arm_smmu_gather_ops = {

Comments

Will Deacon Dec. 7, 2015, 1:34 p.m. UTC | #1
On Mon, Dec 07, 2015 at 01:09:35PM +0000, Robin Murphy wrote:
> On 07/12/15 11:09, Will Deacon wrote:
> >On Fri, Dec 04, 2015 at 05:53:00PM +0000, Robin Murphy wrote:
> >>When invalidating an IOVA range potentially spanning multiple pages,
> >>such as when removing an entire intermediate-level table, we currently
> >>only issue an invalidation for the first IOVA of that range. Since the
> >>architecture specifies that address-based TLB maintenance operations
> >>target a single entry, an SMMU could feasibly retain live entries for
> >>subsequent pages within that unmapped range, which is not good.
> >>
> >>Make sure we hit every possible entry by iterating over the whole range
> >>at the granularity provided by the pagetable implementation.
> >>
> >>Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> >>---
> >>  drivers/iommu/arm-smmu.c | 19 ++++++++++++++++---
> >>  1 file changed, 16 insertions(+), 3 deletions(-)
> >
> >Can you do something similar for arm-smmu-v3.c as well, please?
> 
> Something like this? (untested as I don't have a v3 model set up):
> 
> ------>8------
> From: Robin Murphy <robin.murphy@arm.com>
> Date: Mon, 7 Dec 2015 12:52:56 +0000
> Subject: [PATCH] iommu/arm-smmu: Fix TLB invalidation
> 
> SMMUv3 operates under the same rules as SMMUv2 and the CPU
> architectures, so when invalidating an IOVA range we have to hit
> every address for which a TLB entry might exist.
> 
> To fix this, issue commands for the whole range rather than just the
> initial address; as a minor optimisation, try to avoid flooding the
> queue by falling back to 'invalidate all' if the range is large.
> 
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/iommu/arm-smmu-v3.c | 9 ++++++++-
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index c302b65..afa0b41 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -1346,6 +1346,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned
> long iova, size_t size,
>  		},
>  	};
> 
> +	/* If we'd fill the whole queue or more, don't even bother... */
> +	if (granule << smmu->cmdq.q.max_n_shift >= size / (CMDQ_ENT_DWORDS << 3))
> +		return arm_smmu_tlb_inv_context(cookie);

Let's not bother with this heuristic for now. It's not at all clear where
the trade off is between CPU time and I/O latency and this check doesn't
take into account the current state of the command queue and/or how quickly
it drains anyway.

>  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
>  		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;
>  		cmd.tlbi.asid	= smmu_domain->s1_cfg.cd.asid;
> @@ -1354,7 +1358,10 @@ static void arm_smmu_tlb_inv_range_nosync(unsigned
> long iova, size_t size,
>  		cmd.tlbi.vmid	= smmu_domain->s2_cfg.vmid;
>  	}
> 
> -	arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +	do {
> +		arm_smmu_cmdq_issue_cmd(smmu, &cmd);
> +		cmd.tlbi.addr += granule;
> +	} while (size -= granule);

This bit looks fine to me, thanks.

Will
diff mbox

Patch

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c302b65..afa0b41 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1346,6 +1346,10 @@  static void 
arm_smmu_tlb_inv_range_nosync(unsigned long iova, size_t size,
  		},
  	};

+	/* If we'd fill the whole queue or more, don't even bother... */
+	if (granule << smmu->cmdq.q.max_n_shift >= size / (CMDQ_ENT_DWORDS << 3))
+		return arm_smmu_tlb_inv_context(cookie);
+
  	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) {
  		cmd.opcode	= CMDQ_OP_TLBI_NH_VA;