diff mbox

OMAP: iommu flush page table entries from L1 and L2 cache

Message ID 1302817968-28516-1-git-send-email-fernando.lugo@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Guzman Lugo, Fernando April 14, 2011, 9:52 p.m. UTC
From: Ramesh Gupta <grgupta@ti.com>

This patch is to flush the iommu page table entries from L1 and L2
caches using dma_map_single. This also simplifies the implementation
by removing the functions  flush_iopgd_range/flush_iopte_range.

Change-Id: I19c0bf437d75c79084b2fa28c7da50a4c84b91a0
Signed-off-by: Ramesh Gupta <grgupta@ti.com>
Signed-off-by: Hari Kanigeri <h-kanigeri2@ti.com>
Signed-off-by: Fernando Guzman Lugo <fernando.lugo@ti.com>
---
 arch/arm/plat-omap/iommu.c |   41 ++++++++++-------------------------------
 1 files changed, 10 insertions(+), 31 deletions(-)

Comments

Russell King - ARM Linux April 14, 2011, 10:30 p.m. UTC | #1
On Thu, Apr 14, 2011 at 04:52:48PM -0500, Fernando Guzman Lugo wrote:
> From: Ramesh Gupta <grgupta@ti.com>
> 
> This patch is to flush the iommu page table entries from L1 and L2
> caches using dma_map_single. This also simplifies the implementation
> by removing the functions  flush_iopgd_range/flush_iopte_range.

No.  This usage is just wrong.  If you're going to use the DMA API then
unmap it, otherwise the DMA API debugging will go awol.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Cho KyongHo April 15, 2011, 2:24 a.m. UTC | #2
Hi, Russell.

I think we need cache maintenance operations that effect on inner and
outer caches at the same time.

Even though the DMA APIs are not for cache maintenance but for IO mapping,
they are useful for cache maint'
because they operate on inner and outer caches.

As you know, inner cache of Cortex-A is logical cache and outer cache
is physical cache in the programmer's point of view.
We need logical address and physical address at the same time to clean
or invalidate inner and outer cache.
That means we need to translate logical to physical address and it is
sometimes not trivial.

Finally, the kernel will contain many similar routines that do same thing.

On Fri, Apr 15, 2011 at 7:30 AM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Apr 14, 2011 at 04:52:48PM -0500, Fernando Guzman Lugo wrote:
>> From: Ramesh Gupta <grgupta@ti.com>
>>
>> This patch is to flush the iommu page table entries from L1 and L2
>> caches using dma_map_single. This also simplifies the implementation
>> by removing the functions  flush_iopgd_range/flush_iopte_range.
>
> No.  This usage is just wrong.  If you're going to use the DMA API then
> unmap it, otherwise the DMA API debugging will go awol.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux April 15, 2011, 8:12 a.m. UTC | #3
On Fri, Apr 15, 2011 at 11:24:16AM +0900, KyongHo Cho wrote:
> That means we need to translate logical to physical address and it is
> sometimes not trivial.

What do you mean "sometimes not trivial" ?  The DMA does nothing more
than virt_to_phys(virt) to get the physical address.  It's _that_ simple.

If virt_to_phys(virt) is likely to fail, there's protection in the DMA API
to BUG_ON() in that case.

> Finally, the kernel will contain many similar routines that do same thing.

So when we get coherent DMA, you won't care that the DMA API functions
start doing nothing with caches?
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gupta, Ramesh April 15, 2011, 11:26 a.m. UTC | #4
Russell,

On Thu, Apr 14, 2011 at 5:30 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Apr 14, 2011 at 04:52:48PM -0500, Fernando Guzman Lugo wrote:
>> From: Ramesh Gupta <grgupta@ti.com>
>>
>> This patch is to flush the iommu page table entries from L1 and L2
>> caches using dma_map_single. This also simplifies the implementation
>> by removing the functions  flush_iopgd_range/flush_iopte_range.
>
> No.  This usage is just wrong.  If you're going to use the DMA API then
> unmap it, otherwise the DMA API debugging will go awol.
>

Thank you for the comments, this particular memory is always a write
from the A9 for MMU programming and
only read from the slave processor, that is the reason for not calling
the unmap. I can re-look into the changes to call
unmap in a proper way as this impacts the DMA API.
Are there any other ways to perform only flush the memory from L1/L2 caches?

 regards
Ramesh Gupta G
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gupta, Ramesh Aug. 11, 2011, 7:28 p.m. UTC | #5
Hi Russel,


On Thu, Apr 28, 2011 at 11:48 AM, Gupta, Ramesh <grgupta@ti.com> wrote:
> Hi Russel,
>
> On Thu, Apr 28, 2011 at 8:40 AM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
>> On Fri, Apr 15, 2011 at 06:26:40AM -0500, Gupta, Ramesh wrote:
>>> Russell,
>>>
>>> On Thu, Apr 14, 2011 at 5:30 PM, Russell King - ARM Linux
>>> <linux@arm.linux.org.uk> wrote:
>>> > On Thu, Apr 14, 2011 at 04:52:48PM -0500, Fernando Guzman Lugo wrote:
>>> >> From: Ramesh Gupta <grgupta@ti.com>
>>> >>
>>> >> This patch is to flush the iommu page table entries from L1 and L2
>>> >> caches using dma_map_single. This also simplifies the implementation
>>> >> by removing the functions  flush_iopgd_range/flush_iopte_range.
>>> >
>>> > No.  This usage is just wrong.  If you're going to use the DMA API then
>>> > unmap it, otherwise the DMA API debugging will go awol.
>>> >
>>>
>>> Thank you for the comments, this particular memory is always a write
>>> from the A9 for MMU programming and
>>> only read from the slave processor, that is the reason for not calling
>>> the unmap. I can re-look into the changes to call
>>> unmap in a proper way as this impacts the DMA API.
>>> Are there any other ways to perform only flush the memory from L1/L2 caches?
>>
>> We _could_ invent a new API to deal with this, which is probably going
>> to be far better in the longer term for page table based iommus.  That's
>> going to need some thought - eg, do we need to pass a struct device
>> argument for the iommu cache flushing so we know whether we need to flush
>> or not (eg, if we have cache coherent iommus)...

my apologies for a late mail on this topic.

do you think of any other requirements for this new API?

Could we use the existing dmac_flush_range(), outer_flush_range()
for this purpose instead of a new API?

I see a comment in the arch/arm/include/asm/cacheflush.h
for  _not_ to use these APIs directly, but I am not really understand
the reason for that.

I would appreciate any inputs on this.

thank you and regards
Ramesh Gupta G
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Russell King - ARM Linux Aug. 11, 2011, 10:29 p.m. UTC | #6
On Thu, Aug 11, 2011 at 02:28:39PM -0500, Gupta, Ramesh wrote:
> Hi Russel,

grr.

> On Thu, Apr 28, 2011 at 11:48 AM, Gupta, Ramesh <grgupta@ti.com> wrote:
> > Hi Russel,
> >
> > On Thu, Apr 28, 2011 at 8:40 AM, Russell King - ARM Linux
> > <linux@arm.linux.org.uk> wrote:
> >> We _could_ invent a new API to deal with this, which is probably going
> >> to be far better in the longer term for page table based iommus.  That's
> >> going to need some thought - eg, do we need to pass a struct device
> >> argument for the iommu cache flushing so we know whether we need to flush
> >> or not (eg, if we have cache coherent iommus)...
> 
> my apologies for a late mail on this topic.
> 
> do you think of any other requirements for this new API?
> 
> Could we use the existing dmac_flush_range(), outer_flush_range()
> for this purpose instead of a new API?
> 
> I see a comment in the arch/arm/include/asm/cacheflush.h
> for  _not_ to use these APIs directly, but I am not really understand
> the reason for that.

When I create APIs, I create them to solve a _purpose_ to code which wants
to do something.  They're not created to provide some facility which can
be re-used for unrelated stuff.

This has been proven many times to be the correct approach.  Over time,
things change.

Let's say for arguments sake that you decide to use the DMA API stuff
to achieve your goals.  Then, lets say that ARM DMA becomes fully
cache coherent, but your IOMMU tables still need to be flushed from
the L1 caches.

Suddenly, dmac_flush_range() starts doing absolutely nothing.  Your
driver breaks.  I get whinged at because a previously working driver
stops working.  In effect, that usage _prevents_ me making the changes
necessary to keep the core architecture support moving forward as
things develop.  Or, alternatively I just ignore your driver, make the
changes anyway and leave it to rot.

So, APIs get created to provide a purpose.  Like - handling the DMA
issues when mapping a buffer to DMA.  Like handling the DMA issues
when unmapping a buffer from DMA.  If you start using those _because_
they happen to clean or invalidate the cache for you, you're really
asking for your driver to be broken at some point in the future.

What is far better to do is to ensure that we have the right APIs in
place for the purposes for which they are to be used.  So, if we need
an API to flush out the IOMMU page table entries, then that's what
we need, not some bodged lets-reuse-the-dma-flushing-functions thing.
Inside the processors implementation, yes, it may well be the same
thing, but that's a decision for the _processor_ support code to make,
not the IOMMU writer.

As to what shape a new API should be - as I said above, maybe it should
take a struct device argument, virtual base address and size.  Or maybe
if we don't have any coherent IOMMUs then just ignore the device
argument for the time being, and just pass the virtual base address and
size.

The next issue is whether it should require the virtual base address to
be in the kernel direct mapped region.  If you're touching L2, then
that's a yes, because we need to use virt_to_phys on it to get at the
phys address for the L2 operations.

So, I think: extend the cpu cache operations structure to have a method
for dealing with IOMMUs.  Add an inline function to deal with calling
that, and the L2 ops if that's what's required.
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Gupta, Ramesh Aug. 12, 2011, 4:05 p.m. UTC | #7
On Thu, Aug 11, 2011 at 5:29 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Thu, Aug 11, 2011 at 02:28:39PM -0500, Gupta, Ramesh wrote:
>> Hi Russel,
>
> grr.

Sorry, typo.

>
>> On Thu, Apr 28, 2011 at 11:48 AM, Gupta, Ramesh <grgupta@ti.com> wrote:
>> > Hi Russel,
>> >
>> > On Thu, Apr 28, 2011 at 8:40 AM, Russell King - ARM Linux
>> > <linux@arm.linux.org.uk> wrote:
>> >> We _could_ invent a new API to deal with this, which is probably going
>> >> to be far better in the longer term for page table based iommus.  That's
>> >> going to need some thought - eg, do we need to pass a struct device
>> >> argument for the iommu cache flushing so we know whether we need to flush
>> >> or not (eg, if we have cache coherent iommus)...
>>
>> my apologies for a late mail on this topic.
>>
>> do you think of any other requirements for this new API?
>>
>> Could we use the existing dmac_flush_range(), outer_flush_range()
>> for this purpose instead of a new API?
>>
>> I see a comment in the arch/arm/include/asm/cacheflush.h
>> for  _not_ to use these APIs directly, but I am not really understand
>> the reason for that.
>
> When I create APIs, I create them to solve a _purpose_ to code which wants
> to do something.  They're not created to provide some facility which can
> be re-used for unrelated stuff.
>
> This has been proven many times to be the correct approach.  Over time,
> things change.
>

I agree.

> Let's say for arguments sake that you decide to use the DMA API stuff
> to achieve your goals.  Then, lets say that ARM DMA becomes fully
> cache coherent, but your IOMMU tables still need to be flushed from
> the L1 caches.
>
> Suddenly, dmac_flush_range() starts doing absolutely nothing.  Your
> driver breaks.  I get whinged at because a previously working driver
> stops working.  In effect, that usage _prevents_ me making the changes
> necessary to keep the core architecture support moving forward as
> things develop.  Or, alternatively I just ignore your driver, make the
> changes anyway and leave it to rot.
>
> So, APIs get created to provide a purpose.  Like - handling the DMA
> issues when mapping a buffer to DMA.  Like handling the DMA issues
> when unmapping a buffer from DMA.  If you start using those _because_
> they happen to clean or invalidate the cache for you, you're really
> asking for your driver to be broken at some point in the future.
>
> What is far better to do is to ensure that we have the right APIs in
> place for the purposes for which they are to be used.  So, if we need
> an API to flush out the IOMMU page table entries, then that's what
> we need, not some bodged lets-reuse-the-dma-flushing-functions thing.
> Inside the processors implementation, yes, it may well be the same
> thing, but that's a decision for the _processor_ support code to make,
> not the IOMMU writer.

I completely agree, thank you for explaining the need to use the correct API.


> As to what shape a new API should be - as I said above, maybe it should
> take a struct device argument, virtual base address and size.  Or maybe
> if we don't have any coherent IOMMUs then just ignore the device
> argument for the time being, and just pass the virtual base address and
> size.
>
> The next issue is whether it should require the virtual base address to
> be in the kernel direct mapped region.  If you're touching L2, then
> that's a yes, because we need to use virt_to_phys on it to get at the
> phys address for the L2 operations.
>
> So, I think: extend the cpu cache operations structure to have a method
> for dealing with IOMMUs.  Add an inline function to deal with calling
> that, and the L2 ops if that's what's required.

Once again thanks for the API requirements, I will work on this and send
the patches for review.

Regards
Ramesh Gupta G
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/arm/plat-omap/iommu.c b/arch/arm/plat-omap/iommu.c
index 8a51fd5..1fb5e41 100644
--- a/arch/arm/plat-omap/iommu.c
+++ b/arch/arm/plat-omap/iommu.c
@@ -18,6 +18,7 @@ 
 #include <linux/ioport.h>
 #include <linux/clk.h>
 #include <linux/platform_device.h>
+#include <linux/dma-mapping.h>
 
 #include <asm/cacheflush.h>
 
@@ -466,29 +467,6 @@  EXPORT_SYMBOL_GPL(foreach_iommu_device);
 
 #endif /* CONFIG_OMAP_IOMMU_DEBUG_MODULE */
 
-/*
- *	H/W pagetable operations
- */
-static void flush_iopgd_range(u32 *first, u32 *last)
-{
-	/* FIXME: L2 cache should be taken care of if it exists */
-	do {
-		asm("mcr	p15, 0, %0, c7, c10, 1 @ flush_pgd"
-		    : : "r" (first));
-		first += L1_CACHE_BYTES / sizeof(*first);
-	} while (first <= last);
-}
-
-static void flush_iopte_range(u32 *first, u32 *last)
-{
-	/* FIXME: L2 cache should be taken care of if it exists */
-	do {
-		asm("mcr	p15, 0, %0, c7, c10, 1 @ flush_pte"
-		    : : "r" (first));
-		first += L1_CACHE_BYTES / sizeof(*first);
-	} while (first <= last);
-}
-
 static void iopte_free(u32 *iopte)
 {
 	/* Note: freed iopte's must be clean ready for re-use */
@@ -515,7 +493,7 @@  static u32 *iopte_alloc(struct iommu *obj, u32 *iopgd, u32 da)
 			return ERR_PTR(-ENOMEM);
 
 		*iopgd = virt_to_phys(iopte) | IOPGD_TABLE;
-		flush_iopgd_range(iopgd, iopgd);
+		dma_map_single(obj->dev, iopgd, 1, DMA_TO_DEVICE);
 
 		dev_vdbg(obj->dev, "%s: a new pte:%p\n", __func__, iopte);
 	} else {
@@ -544,7 +522,7 @@  static int iopgd_alloc_section(struct iommu *obj, u32 da, u32 pa, u32 prot)
 	}
 
 	*iopgd = (pa & IOSECTION_MASK) | prot | IOPGD_SECTION;
-	flush_iopgd_range(iopgd, iopgd);
+	dma_map_single(obj->dev, iopgd, 1, DMA_TO_DEVICE);
 	return 0;
 }
 
@@ -561,7 +539,7 @@  static int iopgd_alloc_super(struct iommu *obj, u32 da, u32 pa, u32 prot)
 
 	for (i = 0; i < 16; i++)
 		*(iopgd + i) = (pa & IOSUPER_MASK) | prot | IOPGD_SUPER;
-	flush_iopgd_range(iopgd, iopgd + 15);
+	dma_map_single(obj->dev, iopgd, 16, DMA_TO_DEVICE);
 	return 0;
 }
 
@@ -574,7 +552,7 @@  static int iopte_alloc_page(struct iommu *obj, u32 da, u32 pa, u32 prot)
 		return PTR_ERR(iopte);
 
 	*iopte = (pa & IOPAGE_MASK) | prot | IOPTE_SMALL;
-	flush_iopte_range(iopte, iopte);
+	dma_map_single(obj->dev, iopte, 1, DMA_TO_DEVICE);
 
 	dev_vdbg(obj->dev, "%s: da:%08x pa:%08x pte:%p *pte:%08x\n",
 		 __func__, da, pa, iopte, *iopte);
@@ -599,7 +577,7 @@  static int iopte_alloc_large(struct iommu *obj, u32 da, u32 pa, u32 prot)
 
 	for (i = 0; i < 16; i++)
 		*(iopte + i) = (pa & IOLARGE_MASK) | prot | IOPTE_LARGE;
-	flush_iopte_range(iopte, iopte + 15);
+	dma_map_single(obj->dev, iopte, 15, DMA_TO_DEVICE);
 	return 0;
 }
 
@@ -703,7 +681,8 @@  static size_t iopgtable_clear_entry_core(struct iommu *obj, u32 da)
 		}
 		bytes *= nent;
 		memset(iopte, 0, nent * sizeof(*iopte));
-		flush_iopte_range(iopte, iopte + (nent - 1) * sizeof(*iopte));
+		dma_map_single(obj->dev, iopte, nent * sizeof(*iopte),
+				 DMA_TO_DEVICE);
 
 		/*
 		 * do table walk to check if this table is necessary or not
@@ -725,7 +704,7 @@  static size_t iopgtable_clear_entry_core(struct iommu *obj, u32 da)
 		bytes *= nent;
 	}
 	memset(iopgd, 0, nent * sizeof(*iopgd));
-	flush_iopgd_range(iopgd, iopgd + (nent - 1) * sizeof(*iopgd));
+	dma_map_single(obj->dev, iopgd, nent * sizeof(*iopgd), DMA_TO_DEVICE);
 out:
 	return bytes;
 }
@@ -770,7 +749,7 @@  static void iopgtable_clear_entry_all(struct iommu *obj)
 			iopte_free(iopte_offset(iopgd, 0));
 
 		*iopgd = 0;
-		flush_iopgd_range(iopgd, iopgd);
+		dma_map_single(obj->dev, iopgd, 1, DMA_TO_DEVICE);
 	}
 
 	flush_iotlb_all(obj);