diff mbox

[V4] ARM: handle user space mapped pages in flush_kernel_dcache_page

Message ID 1368336956-6693-1-git-send-email-gmbnomis@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Simon Baatz May 12, 2013, 5:35 a.m. UTC
Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
it needs to handle are kernel mapped only.  However, for example when doing
direct I/O, pages with user space mappings may occur.

Thus, continue to do lazy flushing if there are no user space mappings.
Otherwise, flush the kernel cache lines directly.

Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Russell King <linux@arm.linux.org.uk>
---

Changes:

in V4:
- get back to simpler version of flush_kernel_dcache_page() and use the
  logic from __flush_dcache_page() to flush the kernel mapping (which
  also takes care of highmem pages)

in V3:
- Followed Catalin's suggestion to reverse the order of the patches

in V2:
- flush_kernel_dcache_page() follows flush_dcache_page() now, except that it
  does not flush the user mappings


 arch/arm/include/asm/cacheflush.h |    4 +---
 arch/arm/mm/flush.c               |   38 +++++++++++++++++++++++++++++++------
 2 files changed, 33 insertions(+), 9 deletions(-)

Comments

Catalin Marinas May 23, 2013, 10:43 a.m. UTC | #1
On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> it needs to handle are kernel mapped only.  However, for example when doing
> direct I/O, pages with user space mappings may occur.
> 
> Thus, continue to do lazy flushing if there are no user space mappings.
> Otherwise, flush the kernel cache lines directly.
> 
> Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Russell King <linux@arm.linux.org.uk>

An issue is that for kunmap_atomic() && VIVT we flush the same page
twice. I don't think we should remove the cache flushing in
__kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
flush_kernel_dcache_page() (unless we just assume that highmem && vivt
don't work together). Flushing the VIVT cache on kunmap is essential
since we just lose the alias after that.

> +void flush_kernel_dcache_page(struct page *page)
> +{
> +	if (cache_is_vivt() || cache_is_vipt_aliasing()) {

So you could add the ((cache_is_vivt() || cache_is_vipt_aliasing()) &&
!PageHighMem(page)) check back (as it was prior to commit f8b63c184,
actually reverting it) and the mapping tests as a possible optimisation.
But in this case there is no need to split __flush_dcache_page() in two
since we don't care about the PageHighMem() case.

And we need to fix kunmap() with VIVT to flush the cache as well (in
case anyone uses highmem on VIVT hardware).
Nicolas Pitre May 25, 2013, 3:53 a.m. UTC | #2
On Thu, 23 May 2013, Catalin Marinas wrote:

> An issue is that for kunmap_atomic() && VIVT we flush the same page
> twice. I don't think we should remove the cache flushing in
> __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> don't work together).

VIVT and highmem do work together.  Highmem for ARM was in fact 
developed on such a platform.


Nicolas
Simon Baatz May 27, 2013, 9:42 p.m. UTC | #3
Hi Catalin,

On Thu, May 23, 2013 at 11:43:03AM +0100, Catalin Marinas wrote:
> On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > it needs to handle are kernel mapped only.  However, for example when doing
> > direct I/O, pages with user space mappings may occur.
> > 
> > Thus, continue to do lazy flushing if there are no user space mappings.
> > Otherwise, flush the kernel cache lines directly.
> > 
> > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Russell King <linux@arm.linux.org.uk>
> 
> An issue is that for kunmap_atomic() && VIVT we flush the same page
> twice. I don't think we should remove the cache flushing in
> __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> don't work together). Flushing the VIVT cache on kunmap is essential
> since we just lose the alias after that.

Not sure if I am missing something here. From our previous
discussion:

> > > > We probably should reuse it in flush_kernel_dcache_page() to flush
> > > > the kernel mapping. (The last else clause looks strange though)
> > > 
> > > I think it makes sense to reuse this logic in
> > > flush_kernel_dcache_page(). If the page is already mapped with kmap,
> > > then kmap_high_get() should return the actual address. If it was mapped
> > > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else'
> > > clause and the additional kmap_atomic(). The cache_is_vipt() check is
> > > useful because kunmap_atomic() would flush VIVT caches anyway.


As you say, this logic does not flush on VIVT if the page was mapped
via kmap_atomic() (and kmap_atomic() could not reuse an existing
mapping via kmap_high_get())

kmap_atomic() tries to reuse an existing mapping and only allocates a
new dedicated kmap if it finds none.  Consequently, __kunmap_atomic()
only flushes in the latter case (because we lose the alias).

Thus, no double flushing should occur here. If kunmap_atomic() will
flush, __flush_kernel_dcache_page() will not and vice versa.

> > +void flush_kernel_dcache_page(struct page *page)
> > +{
> > +	if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> 
> So you could add the ((cache_is_vivt() || cache_is_vipt_aliasing()) &&
> !PageHighMem(page)) check back (as it was prior to commit f8b63c184,
> actually reverting it) and the mapping tests as a possible optimisation.
> But in this case there is no need to split __flush_dcache_page() in two
> since we don't care about the PageHighMem() case.
> 
> And we need to fix kunmap() with VIVT to flush the cache as well (in
> case anyone uses highmem on VIVT hardware).

I still think your previous comment that we need to handle the
highmem case as well was right and reusing the respective part of
__flush_dcache_page() does the right thing.

But as said, perhaps I am missing something here.

- Simon
Catalin Marinas May 28, 2013, 9:55 a.m. UTC | #4
On Sat, May 25, 2013 at 04:53:19AM +0100, Nicolas Pitre wrote:
> On Thu, 23 May 2013, Catalin Marinas wrote:
> 
> > An issue is that for kunmap_atomic() && VIVT we flush the same page
> > twice. I don't think we should remove the cache flushing in
> > __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> > flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> > don't work together).
> 
> VIVT and highmem do work together.  Highmem for ARM was in fact 
> developed on such a platform.

Thanks for confirming. Do you remember why kunmap() doesn't (need to)
flush the cache alias (for VIVT)? Or if it does, where?
Catalin Marinas May 28, 2013, 10:20 a.m. UTC | #5
On Mon, May 27, 2013 at 10:42:45PM +0100, Simon Baatz wrote:
> On Thu, May 23, 2013 at 11:43:03AM +0100, Catalin Marinas wrote:
> > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > it needs to handle are kernel mapped only.  However, for example when doing
> > > direct I/O, pages with user space mappings may occur.
> > > 
> > > Thus, continue to do lazy flushing if there are no user space mappings.
> > > Otherwise, flush the kernel cache lines directly.
> > > 
> > > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > Cc: Russell King <linux@arm.linux.org.uk>
> > 
> > An issue is that for kunmap_atomic() && VIVT we flush the same page
> > twice. I don't think we should remove the cache flushing in
> > __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> > flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> > don't work together). Flushing the VIVT cache on kunmap is essential
> > since we just lose the alias after that.
> 
> Not sure if I am missing something here. From our previous
> discussion:
> 
> > > > > We probably should reuse it in flush_kernel_dcache_page() to flush
> > > > > the kernel mapping. (The last else clause looks strange though)
> > > > 
> > > > I think it makes sense to reuse this logic in
> > > > flush_kernel_dcache_page(). If the page is already mapped with kmap,
> > > > then kmap_high_get() should return the actual address. If it was mapped
> > > > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else'
> > > > clause and the additional kmap_atomic(). The cache_is_vipt() check is
> > > > useful because kunmap_atomic() would flush VIVT caches anyway.

I think the context was that page_address() is not always valid after
kmap_atomic(), so reusing the same logic would make sense.

But the key point I try to understand is why kunmap() does not need to
flush the VIVT cache. If it does need to, then we can simplify your
patch.

> As you say, this logic does not flush on VIVT if the page was mapped
> via kmap_atomic() (and kmap_atomic() could not reuse an existing
> mapping via kmap_high_get())
> 
> kmap_atomic() tries to reuse an existing mapping and only allocates a
> new dedicated kmap if it finds none.  Consequently, __kunmap_atomic()
> only flushes in the latter case (because we lose the alias).
> 
> Thus, no double flushing should occur here. If kunmap_atomic() will
> flush, __flush_kernel_dcache_page() will not and vice versa.

That's correct for the current code. Let's wait for some insight from
Nico or Russell on whether kunmap() needs to lush the VIVT cache as
well. If it doesn't need to, then my original comment made sense.
Nicolas Pitre May 28, 2013, 5:52 p.m. UTC | #6
On Tue, 28 May 2013, Catalin Marinas wrote:

> On Sat, May 25, 2013 at 04:53:19AM +0100, Nicolas Pitre wrote:
> > On Thu, 23 May 2013, Catalin Marinas wrote:
> > 
> > > An issue is that for kunmap_atomic() && VIVT we flush the same page
> > > twice. I don't think we should remove the cache flushing in
> > > __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> > > flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> > > don't work together).
> > 
> > VIVT and highmem do work together.  Highmem for ARM was in fact 
> > developed on such a platform.
> 
> Thanks for confirming. Do you remember why kunmap() doesn't (need to)
> flush the cache alias (for VIVT)? Or if it does, where?

kunmap() calls kunmap_high() which actually does not unmap anything but 
only decrement a use count.  The page is kept around in case it is 
kmap()'d again, in which case the kmap() will be almost free and the 
page still warm.

It is only when the kmap() for a new page runs out of free pkmap  
entries that the unused but still mapped pages are unmapped.  This 
unmapping is done in flush_all_zero_pkmaps() where all the unused pages 
are all unmapped at once.  The cache for all those pages is flushed with 
flush_cache_kmaps() which on ARM maps to flush_cache_all() when the 
cache is VIVT.


Nicolas
Nicolas Pitre May 28, 2013, 6:50 p.m. UTC | #7
On Tue, 28 May 2013, Catalin Marinas wrote:

> On Mon, May 27, 2013 at 10:42:45PM +0100, Simon Baatz wrote:
> > On Thu, May 23, 2013 at 11:43:03AM +0100, Catalin Marinas wrote:
> > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > direct I/O, pages with user space mappings may occur.
> > > > 
> > > > Thus, continue to do lazy flushing if there are no user space mappings.
> > > > Otherwise, flush the kernel cache lines directly.
> > > > 
> > > > Signed-off-by: Simon Baatz <gmbnomis@gmail.com>
> > > > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > > > Cc: Russell King <linux@arm.linux.org.uk>
> > > 
> > > An issue is that for kunmap_atomic() && VIVT we flush the same page
> > > twice. I don't think we should remove the cache flushing in
> > > __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> > > flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> > > don't work together). Flushing the VIVT cache on kunmap is essential
> > > since we just lose the alias after that.
> > 
> > Not sure if I am missing something here. From our previous
> > discussion:
> > 
> > > > > > We probably should reuse it in flush_kernel_dcache_page() to flush
> > > > > > the kernel mapping. (The last else clause looks strange though)
> > > > > 
> > > > > I think it makes sense to reuse this logic in
> > > > > flush_kernel_dcache_page(). If the page is already mapped with kmap,
> > > > > then kmap_high_get() should return the actual address. If it was mapped
> > > > > with kmap_atomic, kmap_high_get() would return NULL, hence the 'else'
> > > > > clause and the additional kmap_atomic(). The cache_is_vipt() check is
> > > > > useful because kunmap_atomic() would flush VIVT caches anyway.
> 
> I think the context was that page_address() is not always valid after
> kmap_atomic(), so reusing the same logic would make sense.

Following my explanation of kunmap()... remember that kunmap()'d pages 
are not actually unmaped right away.  So the only thing that 
kmap_high_get() does is to look for such a mapping and pin it with an 
increased use count if it exists, or return NULL if there is no such 
mapping.  In the former case page_address() is valid, in the second case 
it is NULL.  Yet, after kunmap_atomic() (or kunmap() as well) you never 
know when page_address() might suddenly return NULL if it returned an 
address unless you pin the page down with kmap_high_get().

> But the key point I try to understand is why kunmap() does not need to
> flush the VIVT cache. If it does need to, then we can simplify your
> patch.

It does not and should not for performance reasons.

> > As you say, this logic does not flush on VIVT if the page was mapped
> > via kmap_atomic() (and kmap_atomic() could not reuse an existing
> > mapping via kmap_high_get())
> > 
> > kmap_atomic() tries to reuse an existing mapping and only allocates a
> > new dedicated kmap if it finds none.  Consequently, __kunmap_atomic()
> > only flushes in the latter case (because we lose the alias).

That is exact.

> > 
> > Thus, no double flushing should occur here. If kunmap_atomic() will
> > flush, __flush_kernel_dcache_page() will not and vice versa.
> 
> That's correct for the current code. Let's wait for some insight from
> Nico or Russell on whether kunmap() needs to lush the VIVT cache as
> well. If it doesn't need to, then my original comment made sense.

I hope I've been able to shed some light on the way highmem caching 
works.  I'm not sure I fully understand what the patch in this thread 
does though, but I've never fully grokked the subtleties behind 
flush_dcache_page either.


Nicolas
Catalin Marinas May 29, 2013, 10:37 a.m. UTC | #8
On Tue, May 28, 2013 at 06:52:27PM +0100, Nicolas Pitre wrote:
> On Tue, 28 May 2013, Catalin Marinas wrote:
> 
> > On Sat, May 25, 2013 at 04:53:19AM +0100, Nicolas Pitre wrote:
> > > On Thu, 23 May 2013, Catalin Marinas wrote:
> > > 
> > > > An issue is that for kunmap_atomic() && VIVT we flush the same page
> > > > twice. I don't think we should remove the cache flushing in
> > > > __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> > > > flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> > > > don't work together).
> > > 
> > > VIVT and highmem do work together.  Highmem for ARM was in fact 
> > > developed on such a platform.
> > 
> > Thanks for confirming. Do you remember why kunmap() doesn't (need to)
> > flush the cache alias (for VIVT)? Or if it does, where?
> 
> kunmap() calls kunmap_high() which actually does not unmap anything but 
> only decrement a use count.  The page is kept around in case it is 
> kmap()'d again, in which case the kmap() will be almost free and the 
> page still warm.
> 
> It is only when the kmap() for a new page runs out of free pkmap  
> entries that the unused but still mapped pages are unmapped.  This 
> unmapping is done in flush_all_zero_pkmaps() where all the unused pages 
> are all unmapped at once.  The cache for all those pages is flushed with 
> flush_cache_kmaps() which on ARM maps to flush_cache_all() when the 
> cache is VIVT.

Thanks for clarification Nico. So if kmap_high_get() returns NULL, there
are no high aliases left in the cache.
Catalin Marinas May 29, 2013, 11:05 a.m. UTC | #9
On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> it needs to handle are kernel mapped only.  However, for example when doing
> direct I/O, pages with user space mappings may occur.

After Nico's clarification, I think the original commit introducing this
function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
their flushing could be deferred for a long time.

For my understanding (if I re-read this tread) - basically code like
this should not leave the user mapping inconsistent:

kmap()
...
flush_kernel_dcache_page()
kunmap()

If we use the atomic variants, we get the cache flushing automatically
but the kunmap_high() does not flush the cache immediately, so we need
to handle it in flush_kernel_dcache_page().

> Thus, continue to do lazy flushing if there are no user space mappings.
> Otherwise, flush the kernel cache lines directly.
...
>  /*
> + * Ensure cache coherency for kernel mapping of this page.
> + *
> + * If the page only exists in the page cache and there are no user
> + * space mappings, this is a no-op since the page was already marked
> + * dirty at creation.  Otherwise, we need to flush the dirty kernel
> + * cache lines directly.
> + */
> +void flush_kernel_dcache_page(struct page *page)
> +{
> +	if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> +		struct address_space *mapping;
> +
> +		mapping = page_mapping(page);
> +
> +		if (!mapping || mapping_mapped(mapping))
> +			__flush_kernel_dcache_page(page);
> +	}
> +}

BTW, does the mapping check optimise anything for the
flush_kernel_dcache_page() uses? Would we have a mapping anyway (or
anonymous page) in most cases?

Otherwise the patch looks good.
Nicolas Pitre May 29, 2013, 2:39 p.m. UTC | #10
On Wed, 29 May 2013, Catalin Marinas wrote:

> On Tue, May 28, 2013 at 06:52:27PM +0100, Nicolas Pitre wrote:
> > On Tue, 28 May 2013, Catalin Marinas wrote:
> > 
> > > On Sat, May 25, 2013 at 04:53:19AM +0100, Nicolas Pitre wrote:
> > > > On Thu, 23 May 2013, Catalin Marinas wrote:
> > > > 
> > > > > An issue is that for kunmap_atomic() && VIVT we flush the same page
> > > > > twice. I don't think we should remove the cache flushing in
> > > > > __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> > > > > flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> > > > > don't work together).
> > > > 
> > > > VIVT and highmem do work together.  Highmem for ARM was in fact 
> > > > developed on such a platform.
> > > 
> > > Thanks for confirming. Do you remember why kunmap() doesn't (need to)
> > > flush the cache alias (for VIVT)? Or if it does, where?
> > 
> > kunmap() calls kunmap_high() which actually does not unmap anything but 
> > only decrement a use count.  The page is kept around in case it is 
> > kmap()'d again, in which case the kmap() will be almost free and the 
> > page still warm.
> > 
> > It is only when the kmap() for a new page runs out of free pkmap  
> > entries that the unused but still mapped pages are unmapped.  This 
> > unmapping is done in flush_all_zero_pkmaps() where all the unused pages 
> > are all unmapped at once.  The cache for all those pages is flushed with 
> > flush_cache_kmaps() which on ARM maps to flush_cache_all() when the 
> > cache is VIVT.
> 
> Thanks for clarification Nico. So if kmap_high_get() returns NULL, there
> are no high aliases left in the cache.

Exact.


Nicolas
Nicolas Pitre May 29, 2013, 4:32 p.m. UTC | #11
On Wed, 29 May 2013, Nicolas Pitre wrote:

> On Wed, 29 May 2013, Catalin Marinas wrote:
> 
> > On Tue, May 28, 2013 at 06:52:27PM +0100, Nicolas Pitre wrote:
> > > On Tue, 28 May 2013, Catalin Marinas wrote:
> > > 
> > > > On Sat, May 25, 2013 at 04:53:19AM +0100, Nicolas Pitre wrote:
> > > > > On Thu, 23 May 2013, Catalin Marinas wrote:
> > > > > 
> > > > > > An issue is that for kunmap_atomic() && VIVT we flush the same page
> > > > > > twice. I don't think we should remove the cache flushing in
> > > > > > __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> > > > > > flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> > > > > > don't work together).
> > > > > 
> > > > > VIVT and highmem do work together.  Highmem for ARM was in fact 
> > > > > developed on such a platform.
> > > > 
> > > > Thanks for confirming. Do you remember why kunmap() doesn't (need to)
> > > > flush the cache alias (for VIVT)? Or if it does, where?
> > > 
> > > kunmap() calls kunmap_high() which actually does not unmap anything but 
> > > only decrement a use count.  The page is kept around in case it is 
> > > kmap()'d again, in which case the kmap() will be almost free and the 
> > > page still warm.
> > > 
> > > It is only when the kmap() for a new page runs out of free pkmap  
> > > entries that the unused but still mapped pages are unmapped.  This 
> > > unmapping is done in flush_all_zero_pkmaps() where all the unused pages 
> > > are all unmapped at once.  The cache for all those pages is flushed with 
> > > flush_cache_kmaps() which on ARM maps to flush_cache_all() when the 
> > > cache is VIVT.
> > 
> > Thanks for clarification Nico. So if kmap_high_get() returns NULL, there
> > are no high aliases left in the cache.
> 
> Exact.

Just to be clear, this is true for VIVT only.
Unmapped highmem pages can still be cached with VIPT.


Nicolas
Catalin Marinas May 29, 2013, 5:33 p.m. UTC | #12
On Wed, May 29, 2013 at 05:32:50PM +0100, Nicolas Pitre wrote:
> On Wed, 29 May 2013, Nicolas Pitre wrote:
> 
> > On Wed, 29 May 2013, Catalin Marinas wrote:
> > 
> > > On Tue, May 28, 2013 at 06:52:27PM +0100, Nicolas Pitre wrote:
> > > > On Tue, 28 May 2013, Catalin Marinas wrote:
> > > > 
> > > > > On Sat, May 25, 2013 at 04:53:19AM +0100, Nicolas Pitre wrote:
> > > > > > On Thu, 23 May 2013, Catalin Marinas wrote:
> > > > > > 
> > > > > > > An issue is that for kunmap_atomic() && VIVT we flush the same page
> > > > > > > twice. I don't think we should remove the cache flushing in
> > > > > > > __kunmap_atomic() for VIVT since not all kunmap_atomic() calls require
> > > > > > > flush_kernel_dcache_page() (unless we just assume that highmem && vivt
> > > > > > > don't work together).
> > > > > > 
> > > > > > VIVT and highmem do work together.  Highmem for ARM was in fact 
> > > > > > developed on such a platform.
> > > > > 
> > > > > Thanks for confirming. Do you remember why kunmap() doesn't (need to)
> > > > > flush the cache alias (for VIVT)? Or if it does, where?
> > > > 
> > > > kunmap() calls kunmap_high() which actually does not unmap anything but 
> > > > only decrement a use count.  The page is kept around in case it is 
> > > > kmap()'d again, in which case the kmap() will be almost free and the 
> > > > page still warm.
> > > > 
> > > > It is only when the kmap() for a new page runs out of free pkmap  
> > > > entries that the unused but still mapped pages are unmapped.  This 
> > > > unmapping is done in flush_all_zero_pkmaps() where all the unused pages 
> > > > are all unmapped at once.  The cache for all those pages is flushed with 
> > > > flush_cache_kmaps() which on ARM maps to flush_cache_all() when the 
> > > > cache is VIVT.
> > > 
> > > Thanks for clarification Nico. So if kmap_high_get() returns NULL, there
> > > are no high aliases left in the cache.
> > 
> > Exact.
> 
> Just to be clear, this is true for VIVT only.
> Unmapped highmem pages can still be cached with VIPT.

Indeed.
Simon Baatz May 29, 2013, 7:16 p.m. UTC | #13
Hi Catalin,

On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > it needs to handle are kernel mapped only.  However, for example when doing
> > direct I/O, pages with user space mappings may occur.
> 
> After Nico's clarification, I think the original commit introducing this
> function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> their flushing could be deferred for a long time.

Yes, I agree.
 
> For my understanding (if I re-read this tread) - basically code like
> this should not leave the user mapping inconsistent:
> 
> kmap()
> ...
> flush_kernel_dcache_page()
> kunmap()

s/user mapping/kernel mapping/ 
The user mapping(s) can be assumed to be consistent when entering
such code. Either there is none or the page was obtained by a
mechanism like __get_user_pages(). Otherwise, we have a problem
anyway when accessing the page via the kernel mapping.

The visibility of a page to user space is just needed for
optimization.  If the page is not visible to user space, we can use
our lazy flushing mechanism (PG_dcache_clean bit) and defer flushing
until the page is mapped into user space (if ever).  Otherwise, we
need to flush the kernel mapping since only that can be dirty.

And, we don't need to do anything in flush_kernel_dcache_page() for
non-aliasing caches, of course.
 
> If we use the atomic variants, we get the cache flushing automatically
> but the kunmap_high() does not flush the cache immediately, so we need
> to handle it in flush_kernel_dcache_page().

Exactly. But only if it is really needed. If we have no user space
mapping and don't lose the kernel mapping, we just may do nothing.
 
> > Thus, continue to do lazy flushing if there are no user space mappings.
> > Otherwise, flush the kernel cache lines directly.
> ...
> >  /*
> > + * Ensure cache coherency for kernel mapping of this page.
> > + *
> > + * If the page only exists in the page cache and there are no user
> > + * space mappings, this is a no-op since the page was already marked
> > + * dirty at creation.  Otherwise, we need to flush the dirty kernel
> > + * cache lines directly.
> > + */
> > +void flush_kernel_dcache_page(struct page *page)
> > +{
> > +	if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> > +		struct address_space *mapping;
> > +
> > +		mapping = page_mapping(page);
> > +
> > +		if (!mapping || mapping_mapped(mapping))
> > +			__flush_kernel_dcache_page(page);
> > +	}
> > +}
> 
> BTW, does the mapping check optimise anything for the
> flush_kernel_dcache_page() uses? Would we have a mapping anyway (or
> anonymous page) in most cases?


Looking at the relevant uses in the kernel, we have:

    drivers/scsi/libsas/sas_host_smp.c
    drivers/mmc/host/mmc_spi.c
    drivers/block/ps3disk.c
    fs/exec.c
    lib/scatterlist.c
 
That is not much (I omit my usual rant here that many uses of
flush_dcache_page() in drivers are incorrect and should be uses of
flush_kernel_dcache_page() ...).

Without looking at the actual code, we seem to have two basic use
cases:

1. Block drivers (the ones listed above + those using the memory
scatterlist iterator in scatterlist.c)

These will probably almost exclusively use page cache pages for which
we can be lazy.  Other pages only occur in special I/O situations
like direct I/O.

2. fs/exec.c

I think these are anonymous pages only


Thus depending on the actual drivers used, usage can be dominated by
page cache pages on one setup and anon pages on the other.

I prefer the currently proposed way since it prevents massive
overflushing if flush_kernel_dcache_page() is used in an I/O path.


(Btw. I am still puzzled as to why making flush_kernel_dcache_page()
the current nop apparently did not cause problems in fs/exec.c.)

> Otherwise the patch looks good.

Thanks. Especially, thanks for pointing out the highmem case.


- Simon
Catalin Marinas May 30, 2013, 4:43 p.m. UTC | #14
On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > it needs to handle are kernel mapped only.  However, for example when doing
> > > direct I/O, pages with user space mappings may occur.
> > 
> > After Nico's clarification, I think the original commit introducing this
> > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > their flushing could be deferred for a long time.
> 
> Yes, I agree.
>  
> > For my understanding (if I re-read this tread) - basically code like
> > this should not leave the user mapping inconsistent:
> > 
> > kmap()
> > ...
> > flush_kernel_dcache_page()
> > kunmap()
> 
> s/user mapping/kernel mapping/ 
> The user mapping(s) can be assumed to be consistent when entering
> such code. Either there is none or the page was obtained by a
> mechanism like __get_user_pages(). Otherwise, we have a problem
> anyway when accessing the page via the kernel mapping.

Indeed. What I meant was user mapping inconsistent with the kernel
mapping. It's the latter that needs flushing.

> > > Thus, continue to do lazy flushing if there are no user space mappings.
> > > Otherwise, flush the kernel cache lines directly.
> > ...
> > >  /*
> > > + * Ensure cache coherency for kernel mapping of this page.
> > > + *
> > > + * If the page only exists in the page cache and there are no user
> > > + * space mappings, this is a no-op since the page was already marked
> > > + * dirty at creation.  Otherwise, we need to flush the dirty kernel
> > > + * cache lines directly.
> > > + */
> > > +void flush_kernel_dcache_page(struct page *page)
> > > +{
> > > +	if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> > > +		struct address_space *mapping;
> > > +
> > > +		mapping = page_mapping(page);
> > > +
> > > +		if (!mapping || mapping_mapped(mapping))
> > > +			__flush_kernel_dcache_page(page);
> > > +	}
> > > +}
> > 
> > BTW, does the mapping check optimise anything for the
> > flush_kernel_dcache_page() uses? Would we have a mapping anyway (or
> > anonymous page) in most cases?
> 
> Looking at the relevant uses in the kernel, we have:
> 
>     drivers/scsi/libsas/sas_host_smp.c
>     drivers/mmc/host/mmc_spi.c
>     drivers/block/ps3disk.c
>     fs/exec.c
>     lib/scatterlist.c
>  
> That is not much (I omit my usual rant here that many uses of
> flush_dcache_page() in drivers are incorrect and should be uses of
> flush_kernel_dcache_page() ...).
> 
> Without looking at the actual code, we seem to have two basic use
> cases:
> 
> 1. Block drivers (the ones listed above + those using the memory
> scatterlist iterator in scatterlist.c)
> 
> These will probably almost exclusively use page cache pages for which
> we can be lazy.  Other pages only occur in special I/O situations
> like direct I/O.
> 
> 2. fs/exec.c
> 
> I think these are anonymous pages only
> 
> Thus depending on the actual drivers used, usage can be dominated by
> page cache pages on one setup and anon pages on the other.
> 
> I prefer the currently proposed way since it prevents massive
> overflushing if flush_kernel_dcache_page() is used in an I/O path.
> 
> (Btw. I am still puzzled as to why making flush_kernel_dcache_page()
> the current nop apparently did not cause problems in fs/exec.c.)

We get extra flushing via flush_old_exec() -> exec_mmap() -> mmput() ->
exit_mmap() -> flush_cache_mm() before we actually start the new exec so
this would flush the arg page as well.

> > Otherwise the patch looks good.
> 
> Thanks. Especially, thanks for pointing out the highmem case.

You can add my ack (before I forget the whole discussion ;))

Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Ming Lei May 31, 2013, 9:07 a.m. UTC | #15
On Sun, May 12, 2013 at 1:35 PM, Simon Baatz <gmbnomis@gmail.com> wrote:
> +void flush_kernel_dcache_page(struct page *page)
> +{
> +       if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> +               struct address_space *mapping;
> +
> +               mapping = page_mapping(page);
> +
> +               if (!mapping || mapping_mapped(mapping))

It is better to replace mapping_mapped(mapping) with page_mapped(page).

> +                       __flush_kernel_dcache_page(page);

I am wondering if I-cache should be flushed here since the user mapping
on the page may be executable and I/D coherency should be maintained.


Thanks,
Jason Cooper May 31, 2013, 12:05 p.m. UTC | #16
On Thu, May 30, 2013 at 05:43:35PM +0100, Catalin Marinas wrote:
> On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> > On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > direct I/O, pages with user space mappings may occur.
> > > 
> > > After Nico's clarification, I think the original commit introducing this
> > > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > > their flushing could be deferred for a long time.
> > 
> > Yes, I agree.
> >  
> > > For my understanding (if I re-read this tread) - basically code like
> > > this should not leave the user mapping inconsistent:
> > > 
> > > kmap()
> > > ...
> > > flush_kernel_dcache_page()
> > > kunmap()
> > 
> > s/user mapping/kernel mapping/ 
> > The user mapping(s) can be assumed to be consistent when entering
> > such code. Either there is none or the page was obtained by a
> > mechanism like __get_user_pages(). Otherwise, we have a problem
> > anyway when accessing the page via the kernel mapping.
> 
> Indeed. What I meant was user mapping inconsistent with the kernel
> mapping. It's the latter that needs flushing.
> 
> > > > Thus, continue to do lazy flushing if there are no user space mappings.
> > > > Otherwise, flush the kernel cache lines directly.
> > > ...
> > > >  /*
> > > > + * Ensure cache coherency for kernel mapping of this page.
> > > > + *
> > > > + * If the page only exists in the page cache and there are no user
> > > > + * space mappings, this is a no-op since the page was already marked
> > > > + * dirty at creation.  Otherwise, we need to flush the dirty kernel
> > > > + * cache lines directly.
> > > > + */
> > > > +void flush_kernel_dcache_page(struct page *page)
> > > > +{
> > > > +	if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> > > > +		struct address_space *mapping;
> > > > +
> > > > +		mapping = page_mapping(page);
> > > > +
> > > > +		if (!mapping || mapping_mapped(mapping))
> > > > +			__flush_kernel_dcache_page(page);
> > > > +	}
> > > > +}
> > > 
> > > BTW, does the mapping check optimise anything for the
> > > flush_kernel_dcache_page() uses? Would we have a mapping anyway (or
> > > anonymous page) in most cases?
> > 
> > Looking at the relevant uses in the kernel, we have:
> > 
> >     drivers/scsi/libsas/sas_host_smp.c
> >     drivers/mmc/host/mmc_spi.c
> >     drivers/block/ps3disk.c
> >     fs/exec.c
> >     lib/scatterlist.c
> >  
> > That is not much (I omit my usual rant here that many uses of
> > flush_dcache_page() in drivers are incorrect and should be uses of
> > flush_kernel_dcache_page() ...).
> > 
> > Without looking at the actual code, we seem to have two basic use
> > cases:
> > 
> > 1. Block drivers (the ones listed above + those using the memory
> > scatterlist iterator in scatterlist.c)
> > 
> > These will probably almost exclusively use page cache pages for which
> > we can be lazy.  Other pages only occur in special I/O situations
> > like direct I/O.
> > 
> > 2. fs/exec.c
> > 
> > I think these are anonymous pages only
> > 
> > Thus depending on the actual drivers used, usage can be dominated by
> > page cache pages on one setup and anon pages on the other.
> > 
> > I prefer the currently proposed way since it prevents massive
> > overflushing if flush_kernel_dcache_page() is used in an I/O path.
> > 
> > (Btw. I am still puzzled as to why making flush_kernel_dcache_page()
> > the current nop apparently did not cause problems in fs/exec.c.)
> 
> We get extra flushing via flush_old_exec() -> exec_mmap() -> mmput() ->
> exit_mmap() -> flush_cache_mm() before we actually start the new exec so
> this would flush the arg page as well.
> 
> > > Otherwise the patch looks good.
> > 
> > Thanks. Especially, thanks for pointing out the highmem case.
> 
> You can add my ack (before I forget the whole discussion ;))
> 
> Acked-by: Catalin Marinas <catalin.marinas@arm.com>

wrt to 73be1591, this appears to go all the way back to v2.6.31.x... Can
Simon go ahead and put this in rmk's patch tracker and mention that it
should go to all -stable trees?

thx,

Jason.
Catalin Marinas May 31, 2013, 2:15 p.m. UTC | #17
On Fri, May 31, 2013 at 01:05:19PM +0100, Jason Cooper wrote:
> On Thu, May 30, 2013 at 05:43:35PM +0100, Catalin Marinas wrote:
> > On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> > > On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > > direct I/O, pages with user space mappings may occur.
> > > > 
> > > > After Nico's clarification, I think the original commit introducing this
> > > > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > > > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > > > their flushing could be deferred for a long time.
> > > 
> > > Yes, I agree.
> > >  
> > > > For my understanding (if I re-read this tread) - basically code like
> > > > this should not leave the user mapping inconsistent:
> > > > 
> > > > kmap()
> > > > ...
> > > > flush_kernel_dcache_page()
> > > > kunmap()
> > > 
> > > s/user mapping/kernel mapping/ 
> > > The user mapping(s) can be assumed to be consistent when entering
> > > such code. Either there is none or the page was obtained by a
> > > mechanism like __get_user_pages(). Otherwise, we have a problem
> > > anyway when accessing the page via the kernel mapping.
> > 
> > Indeed. What I meant was user mapping inconsistent with the kernel
> > mapping. It's the latter that needs flushing.
> > 
> > > > > Thus, continue to do lazy flushing if there are no user space mappings.
> > > > > Otherwise, flush the kernel cache lines directly.
> > > > ...
> > > > >  /*
> > > > > + * Ensure cache coherency for kernel mapping of this page.
> > > > > + *
> > > > > + * If the page only exists in the page cache and there are no user
> > > > > + * space mappings, this is a no-op since the page was already marked
> > > > > + * dirty at creation.  Otherwise, we need to flush the dirty kernel
> > > > > + * cache lines directly.
> > > > > + */
> > > > > +void flush_kernel_dcache_page(struct page *page)
> > > > > +{
> > > > > +	if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> > > > > +		struct address_space *mapping;
> > > > > +
> > > > > +		mapping = page_mapping(page);
> > > > > +
> > > > > +		if (!mapping || mapping_mapped(mapping))
> > > > > +			__flush_kernel_dcache_page(page);
> > > > > +	}
> > > > > +}
> > > > 
> > > > BTW, does the mapping check optimise anything for the
> > > > flush_kernel_dcache_page() uses? Would we have a mapping anyway (or
> > > > anonymous page) in most cases?
> > > 
> > > Looking at the relevant uses in the kernel, we have:
> > > 
> > >     drivers/scsi/libsas/sas_host_smp.c
> > >     drivers/mmc/host/mmc_spi.c
> > >     drivers/block/ps3disk.c
> > >     fs/exec.c
> > >     lib/scatterlist.c
> > >  
> > > That is not much (I omit my usual rant here that many uses of
> > > flush_dcache_page() in drivers are incorrect and should be uses of
> > > flush_kernel_dcache_page() ...).
> > > 
> > > Without looking at the actual code, we seem to have two basic use
> > > cases:
> > > 
> > > 1. Block drivers (the ones listed above + those using the memory
> > > scatterlist iterator in scatterlist.c)
> > > 
> > > These will probably almost exclusively use page cache pages for which
> > > we can be lazy.  Other pages only occur in special I/O situations
> > > like direct I/O.
> > > 
> > > 2. fs/exec.c
> > > 
> > > I think these are anonymous pages only
> > > 
> > > Thus depending on the actual drivers used, usage can be dominated by
> > > page cache pages on one setup and anon pages on the other.
> > > 
> > > I prefer the currently proposed way since it prevents massive
> > > overflushing if flush_kernel_dcache_page() is used in an I/O path.
> > > 
> > > (Btw. I am still puzzled as to why making flush_kernel_dcache_page()
> > > the current nop apparently did not cause problems in fs/exec.c.)
> > 
> > We get extra flushing via flush_old_exec() -> exec_mmap() -> mmput() ->
> > exit_mmap() -> flush_cache_mm() before we actually start the new exec so
> > this would flush the arg page as well.
> > 
> > > > Otherwise the patch looks good.
> > > 
> > > Thanks. Especially, thanks for pointing out the highmem case.
> > 
> > You can add my ack (before I forget the whole discussion ;))
> > 
> > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> 
> wrt to 73be1591, this appears to go all the way back to v2.6.31.x... Can
> Simon go ahead and put this in rmk's patch tracker and mention that it
> should go to all -stable trees?

I'm not sure how easy it is to apply this patch on past stable versions.
Maybe something simpler for stable like always flushing the cache in
flush_kernel_dcache_page() with optimisation only in recent stable
versions.
Jason Cooper May 31, 2013, 2:20 p.m. UTC | #18
On Fri, May 31, 2013 at 03:15:08PM +0100, Catalin Marinas wrote:
> On Fri, May 31, 2013 at 01:05:19PM +0100, Jason Cooper wrote:
> > On Thu, May 30, 2013 at 05:43:35PM +0100, Catalin Marinas wrote:
> > > On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> > > > On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > > > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > > > direct I/O, pages with user space mappings may occur.
> > > > > 
> > > > > After Nico's clarification, I think the original commit introducing this
> > > > > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > > > > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > > > > their flushing could be deferred for a long time.
> > > > 
> > > > Yes, I agree.
> > > >  
> > > > > For my understanding (if I re-read this tread) - basically code like
> > > > > this should not leave the user mapping inconsistent:
> > > > > 
> > > > > kmap()
> > > > > ...
> > > > > flush_kernel_dcache_page()
> > > > > kunmap()
> > > > 
> > > > s/user mapping/kernel mapping/ 
> > > > The user mapping(s) can be assumed to be consistent when entering
> > > > such code. Either there is none or the page was obtained by a
> > > > mechanism like __get_user_pages(). Otherwise, we have a problem
> > > > anyway when accessing the page via the kernel mapping.
> > > 
> > > Indeed. What I meant was user mapping inconsistent with the kernel
> > > mapping. It's the latter that needs flushing.
> > > 
> > > > > > Thus, continue to do lazy flushing if there are no user space mappings.
> > > > > > Otherwise, flush the kernel cache lines directly.
> > > > > ...
> > > > > >  /*
> > > > > > + * Ensure cache coherency for kernel mapping of this page.
> > > > > > + *
> > > > > > + * If the page only exists in the page cache and there are no user
> > > > > > + * space mappings, this is a no-op since the page was already marked
> > > > > > + * dirty at creation.  Otherwise, we need to flush the dirty kernel
> > > > > > + * cache lines directly.
> > > > > > + */
> > > > > > +void flush_kernel_dcache_page(struct page *page)
> > > > > > +{
> > > > > > +	if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> > > > > > +		struct address_space *mapping;
> > > > > > +
> > > > > > +		mapping = page_mapping(page);
> > > > > > +
> > > > > > +		if (!mapping || mapping_mapped(mapping))
> > > > > > +			__flush_kernel_dcache_page(page);
> > > > > > +	}
> > > > > > +}
> > > > > 
> > > > > BTW, does the mapping check optimise anything for the
> > > > > flush_kernel_dcache_page() uses? Would we have a mapping anyway (or
> > > > > anonymous page) in most cases?
> > > > 
> > > > Looking at the relevant uses in the kernel, we have:
> > > > 
> > > >     drivers/scsi/libsas/sas_host_smp.c
> > > >     drivers/mmc/host/mmc_spi.c
> > > >     drivers/block/ps3disk.c
> > > >     fs/exec.c
> > > >     lib/scatterlist.c
> > > >  
> > > > That is not much (I omit my usual rant here that many uses of
> > > > flush_dcache_page() in drivers are incorrect and should be uses of
> > > > flush_kernel_dcache_page() ...).
> > > > 
> > > > Without looking at the actual code, we seem to have two basic use
> > > > cases:
> > > > 
> > > > 1. Block drivers (the ones listed above + those using the memory
> > > > scatterlist iterator in scatterlist.c)
> > > > 
> > > > These will probably almost exclusively use page cache pages for which
> > > > we can be lazy.  Other pages only occur in special I/O situations
> > > > like direct I/O.
> > > > 
> > > > 2. fs/exec.c
> > > > 
> > > > I think these are anonymous pages only
> > > > 
> > > > Thus depending on the actual drivers used, usage can be dominated by
> > > > page cache pages on one setup and anon pages on the other.
> > > > 
> > > > I prefer the currently proposed way since it prevents massive
> > > > overflushing if flush_kernel_dcache_page() is used in an I/O path.
> > > > 
> > > > (Btw. I am still puzzled as to why making flush_kernel_dcache_page()
> > > > the current nop apparently did not cause problems in fs/exec.c.)
> > > 
> > > We get extra flushing via flush_old_exec() -> exec_mmap() -> mmput() ->
> > > exit_mmap() -> flush_cache_mm() before we actually start the new exec so
> > > this would flush the arg page as well.
> > > 
> > > > > Otherwise the patch looks good.
> > > > 
> > > > Thanks. Especially, thanks for pointing out the highmem case.
> > > 
> > > You can add my ack (before I forget the whole discussion ;))
> > > 
> > > Acked-by: Catalin Marinas <catalin.marinas@arm.com>
> > 
> > wrt to 73be1591, this appears to go all the way back to v2.6.31.x... Can
> > Simon go ahead and put this in rmk's patch tracker and mention that it
> > should go to all -stable trees?
> 
> I'm not sure how easy it is to apply this patch on past stable versions.
> Maybe something simpler for stable like always flushing the cache in
> flush_kernel_dcache_page() with optimisation only in recent stable
> versions.

Ok, then please don't let that hold this up.  We can do something separate
for stable later if it's desired.

thx,

Jason.
Simon Baatz May 31, 2013, 6:54 p.m. UTC | #19
On Fri, May 31, 2013 at 05:07:22PM +0800, Ming Lei wrote:
> On Sun, May 12, 2013 at 1:35 PM, Simon Baatz <gmbnomis@gmail.com> wrote:
> > +void flush_kernel_dcache_page(struct page *page)
> > +{
> > +       if (cache_is_vivt() || cache_is_vipt_aliasing()) {
> > +               struct address_space *mapping;
> > +
> > +               mapping = page_mapping(page);
> > +
> > +               if (!mapping || mapping_mapped(mapping))
> 
> It is better to replace mapping_mapped(mapping) with page_mapped(page).

Good point. I did not have the time to look into your proposed change
for flush_dcache_page() yet.

However, this patch here is a fix for a long standing bug, which
should also be applied to stable in one form or another.  I would not
like to mix it with a separate performance improvement.

But, of course, if we decide to change the respective condition in
flush_dcache_page(), we should also change it in
flush_kernel_dcache_page() at the same time (provided this patch gets
accepted).
 
> > +                       __flush_kernel_dcache_page(page);
> 
> I am wondering if I-cache should be flushed here since the user mapping
> on the page may be executable and I/D coherency should be maintained.

Me too. In fact, I wondered so much that I proposed it myself in a
previous version of the patch (see [1]) :-) However, this has never
been part of flush_kernel_dcache_page() and it is not clear whether
the flush for I/D-cache coherency is really needed here (or in
flush_dcache_page() for that matter) or whether it is sufficient to
do it when mapping the page into user space.

On top of that neither Russell nor Catalin liked this idea very much.

- Simon


[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-October/124292.html
Ming Lei June 1, 2013, 10:27 a.m. UTC | #20
On Sat, Jun 1, 2013 at 2:54 AM, Simon Baatz <gmbnomis@gmail.com> wrote:
> On Fri, May 31, 2013 at 05:07:22PM +0800, Ming Lei wrote:
>> On Sun, May 12, 2013 at 1:35 PM, Simon Baatz <gmbnomis@gmail.com> wrote:
>> > +void flush_kernel_dcache_page(struct page *page)
>> > +{
>> > +       if (cache_is_vivt() || cache_is_vipt_aliasing()) {
>> > +               struct address_space *mapping;
>> > +
>> > +               mapping = page_mapping(page);
>> > +
>> > +               if (!mapping || mapping_mapped(mapping))
>>
>> It is better to replace mapping_mapped(mapping) with page_mapped(page).
>
> Good point. I did not have the time to look into your proposed change
> for flush_dcache_page() yet.
>
> However, this patch here is a fix for a long standing bug, which
> should also be applied to stable in one form or another.  I would not
> like to mix it with a separate performance improvement.

OK, got it, so you should have added "CC: stable", and it is better to add
the bug fix description in your change log, :-)

>
> But, of course, if we decide to change the respective condition in
> flush_dcache_page(), we should also change it in
> flush_kernel_dcache_page() at the same time (provided this patch gets
> accepted).
>
>> > +                       __flush_kernel_dcache_page(page);
>>
>> I am wondering if I-cache should be flushed here since the user mapping
>> on the page may be executable and I/D coherency should be maintained.
>
> Me too. In fact, I wondered so much that I proposed it myself in a
> previous version of the patch (see [1]) :-) However, this has never
> been part of flush_kernel_dcache_page() and it is not clear whether
> the flush for I/D-cache coherency is really needed here (or in
> flush_dcache_page() for that matter) or whether it is sufficient to
> do it when mapping the page into user space.

After reading cachetlb.txt again, looks your patch is correct, because
this kernel API is called in the situations:

        It is assumed here that the user has no incoherent cached copies
       (i.e. the original page was obtained from a mechanism like
get_user_pages()).

So only flushing kernel mapping for the page is correct if callers of the API
obey the rule.


Thanks,
Catalin Marinas June 3, 2013, 9:33 a.m. UTC | #21
On Sat, Jun 01, 2013 at 11:27:38AM +0100, Ming Lei wrote:
> On Sat, Jun 1, 2013 at 2:54 AM, Simon Baatz <gmbnomis@gmail.com> wrote:
> > On Fri, May 31, 2013 at 05:07:22PM +0800, Ming Lei wrote:
> >> On Sun, May 12, 2013 at 1:35 PM, Simon Baatz <gmbnomis@gmail.com> wrote:
> >> > +                       __flush_kernel_dcache_page(page);
> >>
> >> I am wondering if I-cache should be flushed here since the user mapping
> >> on the page may be executable and I/D coherency should be maintained.
> >
> > Me too. In fact, I wondered so much that I proposed it myself in a
> > previous version of the patch (see [1]) :-) However, this has never
> > been part of flush_kernel_dcache_page() and it is not clear whether
> > the flush for I/D-cache coherency is really needed here (or in
> > flush_dcache_page() for that matter) or whether it is sufficient to
> > do it when mapping the page into user space.
> 
> After reading cachetlb.txt again, looks your patch is correct, because
> this kernel API is called in the situations:
> 
>         It is assumed here that the user has no incoherent cached copies
>        (i.e. the original page was obtained from a mechanism like
> get_user_pages()).
> 
> So only flushing kernel mapping for the page is correct if callers of the API
> obey the rule.

I think we should stick with just the D-cache flushing here. We can
revisit when we see a clear scenario where I-cache maintenance is
needed.

Regarding the 'user has no incoherent cached copies', this would work on
ARMv5 but not on ARMv7 unless you unmap the page from user space
(because of speculative loads into both I and D caches). Since we only
target the D-cache, we just leave flush_kernel_dcache_page() as no-op on
ARMv7 (non-aliasing VIPT).
Simon Baatz June 3, 2013, 5:38 p.m. UTC | #22
Hi Catalin,

On Fri, May 31, 2013 at 03:15:08PM +0100, Catalin Marinas wrote:
> On Fri, May 31, 2013 at 01:05:19PM +0100, Jason Cooper wrote:
> > On Thu, May 30, 2013 at 05:43:35PM +0100, Catalin Marinas wrote:
> > > On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> > > > On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > > > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > > > direct I/O, pages with user space mappings may occur.
> > > > > 
> > > > > After Nico's clarification, I think the original commit introducing this
> > > > > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > > > > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > > > > their flushing could be deferred for a long time.
> > > > 
> > > > Yes, I agree.
> > > >  
> > > > > For my understanding (if I re-read this tread) - basically code like
> > > > > this should not leave the user mapping inconsistent:
> > > > > 
> > > > > kmap()
> > > > > ...
> > > > > flush_kernel_dcache_page()
> > > > > kunmap()
> ...
> > 
> > wrt to 73be1591, this appears to go all the way back to v2.6.31.x... Can
> > Simon go ahead and put this in rmk's patch tracker and mention that it
> > should go to all -stable trees?
> 
> I'm not sure how easy it is to apply this patch on past stable versions.
> Maybe something simpler for stable like always flushing the cache in
> flush_kernel_dcache_page() with optimisation only in recent stable
> versions.

Looking at how to adapt the change to previous kernel versions, it
occurred to me that we could probably simplify the current patch
since we can make stronger assumptions than __flush_dcache_page()
can do. As discussed, __flush_dcache_page() does:

                        addr = kmap_high_get(page);
                        if (addr) {
                                __cpuc_flush_dcache_area(addr, PAGE_SIZE);
                                kunmap_high(page);
                        }

The kmap_high_get() is used to ensure that the page is/remains pinned
during the flush.  However, the API of flush_kernel_dcache_page()
ensures that the page is already kmapped when it is called (as in the
quoted example above).

Thus, it is already pinned and we can simply look at page_address(). 
If it is NULL, the page must have been obtained via kmap_atomic() and
we don't need to flush anyway since kunmap_atomic() will do this. 

The actual flush covering both the lowmem and highmem cases actually
is as simple as this (inspired by __flush_dcache_page() in 2.6.32):

	addr = page_address(page);
#ifdef CONFIG_HIGHMEM
	/*
	 * kmap_atomic() doesn't set the page virtual address, and
	 * kunmap_atomic() takes care of cache flushing already.
	 */
	if (addr)
#endif
		__cpuc_flush_dcache_area(addr, PAGE_SIZE);

If you agree that this makes sense, I will send a new version.

If we change the patch this way, there is no need to split off
__flush_kernel_dcache_page() from __flush_dcache_page(), which also
makes it simpler to apply to past stable kernels.

The only thing we need to be aware of is the change of
flush_kernel_dcache_page() in 2.6.37 (commit f8b63c1).  In earlier
versions, we would only need to fix the highmem case.  However, I
wonder whether it makes sense to apply a fix to 2.6.32 and 2.6.34
without this ever being reported as a problem.


- Simon
Jason Cooper June 3, 2013, 6:03 p.m. UTC | #23
All,

Adding gregkh for insight into -stable and applying fixes to previous
kernels when the bug was not against them (but the offending commit
exists in them).

Greg,

Some history:  This bug stems from attempting to write O_DIRECT to a
dm-crypt device on ARM architectures.  In my case, it was trying (years
ago!) to put an LVM into a dm-crypt container.  This data corruption can
be more readily seen by using dd to write to a dm-crypt device with the
direct flag.

On Mon, Jun 03, 2013 at 07:38:18PM +0200, Simon Baatz wrote:
> Hi Catalin,
> 
> On Fri, May 31, 2013 at 03:15:08PM +0100, Catalin Marinas wrote:
> > On Fri, May 31, 2013 at 01:05:19PM +0100, Jason Cooper wrote:
> > > On Thu, May 30, 2013 at 05:43:35PM +0100, Catalin Marinas wrote:
> > > > On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> > > > > On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > > > > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > > > > direct I/O, pages with user space mappings may occur.
> > > > > > 
> > > > > > After Nico's clarification, I think the original commit introducing this
> > > > > > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > > > > > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > > > > > their flushing could be deferred for a long time.
> > > > > 
> > > > > Yes, I agree.
> > > > >  
> > > > > > For my understanding (if I re-read this tread) - basically code like
> > > > > > this should not leave the user mapping inconsistent:
> > > > > > 
> > > > > > kmap()
> > > > > > ...
> > > > > > flush_kernel_dcache_page()
> > > > > > kunmap()
> > ...
> > > 
> > > wrt to 73be1591, this appears to go all the way back to v2.6.31.x... Can
> > > Simon go ahead and put this in rmk's patch tracker and mention that it
> > > should go to all -stable trees?
> > 
> > I'm not sure how easy it is to apply this patch on past stable versions.
> > Maybe something simpler for stable like always flushing the cache in
> > flush_kernel_dcache_page() with optimisation only in recent stable
> > versions.
> 
> Looking at how to adapt the change to previous kernel versions, it
> occurred to me that we could probably simplify the current patch
> since we can make stronger assumptions than __flush_dcache_page()
> can do. As discussed, __flush_dcache_page() does:
> 
>                         addr = kmap_high_get(page);
>                         if (addr) {
>                                 __cpuc_flush_dcache_area(addr, PAGE_SIZE);
>                                 kunmap_high(page);
>                         }
> 
> The kmap_high_get() is used to ensure that the page is/remains pinned
> during the flush.  However, the API of flush_kernel_dcache_page()
> ensures that the page is already kmapped when it is called (as in the
> quoted example above).
> 
> Thus, it is already pinned and we can simply look at page_address(). 
> If it is NULL, the page must have been obtained via kmap_atomic() and
> we don't need to flush anyway since kunmap_atomic() will do this. 
> 
> The actual flush covering both the lowmem and highmem cases actually
> is as simple as this (inspired by __flush_dcache_page() in 2.6.32):
> 
> 	addr = page_address(page);
> #ifdef CONFIG_HIGHMEM
> 	/*
> 	 * kmap_atomic() doesn't set the page virtual address, and
> 	 * kunmap_atomic() takes care of cache flushing already.
> 	 */
> 	if (addr)
> #endif
> 		__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> 
> If you agree that this makes sense, I will send a new version.
> 
> If we change the patch this way, there is no need to split off
> __flush_kernel_dcache_page() from __flush_dcache_page(), which also
> makes it simpler to apply to past stable kernels.
> 
> The only thing we need to be aware of is the change of
> flush_kernel_dcache_page() in 2.6.37 (commit f8b63c1).  In earlier
> versions, we would only need to fix the highmem case.  However, I
> wonder whether it makes sense to apply a fix to 2.6.32 and 2.6.34
> without this ever being reported as a problem.

imho, if you discover a problem while running kernel v3.9.x, and the
commit causing the problem goes all the way back to 2.6.32, then the fix
should go back that far as well (assuming those kernels are still
actively maintained).  I don't think we would need a bug report for each
kernel version.

However, some more nuanced problems are not the result of a single
commit, but potentially the interaction of several commits.  In this
case, a more conservative application may be in order.

Greg, any thoughts?

thx,

Jason.
Simon Baatz June 3, 2013, 7:11 p.m. UTC | #24
Hi Jason, Greg,

On Mon, Jun 03, 2013 at 02:03:39PM -0400, Jason Cooper wrote:
> All,
> 
> Adding gregkh for insight into -stable and applying fixes to previous
> kernels when the bug was not against them (but the offending commit
> exists in them).
> 
> Greg,
> 
> Some history:  This bug stems from attempting to write O_DIRECT to a
> dm-crypt device on ARM architectures.  In my case, it was trying (years
> ago!) to put an LVM into a dm-crypt container.  This data corruption can
> be more readily seen by using dd to write to a dm-crypt device with the
> direct flag.

Some more history: The bug Jason describes is a regression that
appeared in 2.6.37.  However, while fixing it, we think we found a
bug in conjunction with highmem that goes back to 2.6.31 or so.  This
may occur on ARM machines with VIVT D-cache, highmem pages and a
driver using flush_kernel_dcache_page(), but not kmap_atomic().

> ...
> > 
> > The only thing we need to be aware of is the change of
> > flush_kernel_dcache_page() in 2.6.37 (commit f8b63c1).  In earlier
> > versions, we would only need to fix the highmem case.  However, I
> > wonder whether it makes sense to apply a fix to 2.6.32 and 2.6.34
> > without this ever being reported as a problem.
> 
> imho, if you discover a problem while running kernel v3.9.x, and the
> commit causing the problem goes all the way back to 2.6.32, then the fix
> should go back that far as well (assuming those kernels are still
> actively maintained).  I don't think we would need a bug report for each
> kernel version.

For the bug you described that's true. But 2.6.32 and 2.6.34 are not
affected.  The question is whether the other bug requires such an
obscure set of conditions that there are simply no affected machines.


- Simon
Jason Cooper June 3, 2013, 7:22 p.m. UTC | #25
It would be helpful if I actually did as I stated ;-)

Adding Greg for real this time.

On Mon, Jun 03, 2013 at 02:03:39PM -0400, Jason Cooper wrote:
> All,
> 
> Adding gregkh for insight into -stable and applying fixes to previous
> kernels when the bug was not against them (but the offending commit
> exists in them).
> 
> Greg,
> 
> Some history:  This bug stems from attempting to write O_DIRECT to a
> dm-crypt device on ARM architectures.  In my case, it was trying (years
> ago!) to put an LVM into a dm-crypt container.  This data corruption can
> be more readily seen by using dd to write to a dm-crypt device with the
> direct flag.
> 
> On Mon, Jun 03, 2013 at 07:38:18PM +0200, Simon Baatz wrote:
> > Hi Catalin,
> > 
> > On Fri, May 31, 2013 at 03:15:08PM +0100, Catalin Marinas wrote:
> > > On Fri, May 31, 2013 at 01:05:19PM +0100, Jason Cooper wrote:
> > > > On Thu, May 30, 2013 at 05:43:35PM +0100, Catalin Marinas wrote:
> > > > > On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> > > > > > On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > > > > > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > > > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > > > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > > > > > direct I/O, pages with user space mappings may occur.
> > > > > > > 
> > > > > > > After Nico's clarification, I think the original commit introducing this
> > > > > > > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > > > > > > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > > > > > > their flushing could be deferred for a long time.
> > > > > > 
> > > > > > Yes, I agree.
> > > > > >  
> > > > > > > For my understanding (if I re-read this tread) - basically code like
> > > > > > > this should not leave the user mapping inconsistent:
> > > > > > > 
> > > > > > > kmap()
> > > > > > > ...
> > > > > > > flush_kernel_dcache_page()
> > > > > > > kunmap()
> > > ...
> > > > 
> > > > wrt to 73be1591, this appears to go all the way back to v2.6.31.x... Can
> > > > Simon go ahead and put this in rmk's patch tracker and mention that it
> > > > should go to all -stable trees?
> > > 
> > > I'm not sure how easy it is to apply this patch on past stable versions.
> > > Maybe something simpler for stable like always flushing the cache in
> > > flush_kernel_dcache_page() with optimisation only in recent stable
> > > versions.
> > 
> > Looking at how to adapt the change to previous kernel versions, it
> > occurred to me that we could probably simplify the current patch
> > since we can make stronger assumptions than __flush_dcache_page()
> > can do. As discussed, __flush_dcache_page() does:
> > 
> >                         addr = kmap_high_get(page);
> >                         if (addr) {
> >                                 __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> >                                 kunmap_high(page);
> >                         }
> > 
> > The kmap_high_get() is used to ensure that the page is/remains pinned
> > during the flush.  However, the API of flush_kernel_dcache_page()
> > ensures that the page is already kmapped when it is called (as in the
> > quoted example above).
> > 
> > Thus, it is already pinned and we can simply look at page_address(). 
> > If it is NULL, the page must have been obtained via kmap_atomic() and
> > we don't need to flush anyway since kunmap_atomic() will do this. 
> > 
> > The actual flush covering both the lowmem and highmem cases actually
> > is as simple as this (inspired by __flush_dcache_page() in 2.6.32):
> > 
> > 	addr = page_address(page);
> > #ifdef CONFIG_HIGHMEM
> > 	/*
> > 	 * kmap_atomic() doesn't set the page virtual address, and
> > 	 * kunmap_atomic() takes care of cache flushing already.
> > 	 */
> > 	if (addr)
> > #endif
> > 		__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > 
> > If you agree that this makes sense, I will send a new version.
> > 
> > If we change the patch this way, there is no need to split off
> > __flush_kernel_dcache_page() from __flush_dcache_page(), which also
> > makes it simpler to apply to past stable kernels.
> > 
> > The only thing we need to be aware of is the change of
> > flush_kernel_dcache_page() in 2.6.37 (commit f8b63c1).  In earlier
> > versions, we would only need to fix the highmem case.  However, I
> > wonder whether it makes sense to apply a fix to 2.6.32 and 2.6.34
> > without this ever being reported as a problem.
> 
> imho, if you discover a problem while running kernel v3.9.x, and the
> commit causing the problem goes all the way back to 2.6.32, then the fix
> should go back that far as well (assuming those kernels are still
> actively maintained).  I don't think we would need a bug report for each
> kernel version.
> 
> However, some more nuanced problems are not the result of a single
> commit, but potentially the interaction of several commits.  In this
> case, a more conservative application may be in order.
> 
> Greg, any thoughts?
> 
> thx,
> 
> Jason.
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Greg Kroah-Hartman June 3, 2013, 8:38 p.m. UTC | #26
On Mon, Jun 03, 2013 at 03:22:30PM -0400, Jason Cooper wrote:
> It would be helpful if I actually did as I stated ;-)
> 
> Adding Greg for real this time.
> 
> On Mon, Jun 03, 2013 at 02:03:39PM -0400, Jason Cooper wrote:
> > All,
> > 
> > Adding gregkh for insight into -stable and applying fixes to previous
> > kernels when the bug was not against them (but the offending commit
> > exists in them).
> > 
> > Greg,
> > 
> > Some history:  This bug stems from attempting to write O_DIRECT to a
> > dm-crypt device on ARM architectures.  In my case, it was trying (years
> > ago!) to put an LVM into a dm-crypt container.  This data corruption can
> > be more readily seen by using dd to write to a dm-crypt device with the
> > direct flag.
> > 
> > On Mon, Jun 03, 2013 at 07:38:18PM +0200, Simon Baatz wrote:
> > > Hi Catalin,
> > > 
> > > On Fri, May 31, 2013 at 03:15:08PM +0100, Catalin Marinas wrote:
> > > > On Fri, May 31, 2013 at 01:05:19PM +0100, Jason Cooper wrote:
> > > > > On Thu, May 30, 2013 at 05:43:35PM +0100, Catalin Marinas wrote:
> > > > > > On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> > > > > > > On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > > > > > > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > > > > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > > > > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > > > > > > direct I/O, pages with user space mappings may occur.
> > > > > > > > 
> > > > > > > > After Nico's clarification, I think the original commit introducing this
> > > > > > > > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > > > > > > > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > > > > > > > their flushing could be deferred for a long time.
> > > > > > > 
> > > > > > > Yes, I agree.
> > > > > > >  
> > > > > > > > For my understanding (if I re-read this tread) - basically code like
> > > > > > > > this should not leave the user mapping inconsistent:
> > > > > > > > 
> > > > > > > > kmap()
> > > > > > > > ...
> > > > > > > > flush_kernel_dcache_page()
> > > > > > > > kunmap()
> > > > ...
> > > > > 
> > > > > wrt to 73be1591, this appears to go all the way back to v2.6.31.x... Can
> > > > > Simon go ahead and put this in rmk's patch tracker and mention that it
> > > > > should go to all -stable trees?
> > > > 
> > > > I'm not sure how easy it is to apply this patch on past stable versions.
> > > > Maybe something simpler for stable like always flushing the cache in
> > > > flush_kernel_dcache_page() with optimisation only in recent stable
> > > > versions.
> > > 
> > > Looking at how to adapt the change to previous kernel versions, it
> > > occurred to me that we could probably simplify the current patch
> > > since we can make stronger assumptions than __flush_dcache_page()
> > > can do. As discussed, __flush_dcache_page() does:
> > > 
> > >                         addr = kmap_high_get(page);
> > >                         if (addr) {
> > >                                 __cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > >                                 kunmap_high(page);
> > >                         }
> > > 
> > > The kmap_high_get() is used to ensure that the page is/remains pinned
> > > during the flush.  However, the API of flush_kernel_dcache_page()
> > > ensures that the page is already kmapped when it is called (as in the
> > > quoted example above).
> > > 
> > > Thus, it is already pinned and we can simply look at page_address(). 
> > > If it is NULL, the page must have been obtained via kmap_atomic() and
> > > we don't need to flush anyway since kunmap_atomic() will do this. 
> > > 
> > > The actual flush covering both the lowmem and highmem cases actually
> > > is as simple as this (inspired by __flush_dcache_page() in 2.6.32):
> > > 
> > > 	addr = page_address(page);
> > > #ifdef CONFIG_HIGHMEM
> > > 	/*
> > > 	 * kmap_atomic() doesn't set the page virtual address, and
> > > 	 * kunmap_atomic() takes care of cache flushing already.
> > > 	 */
> > > 	if (addr)
> > > #endif
> > > 		__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > > 
> > > If you agree that this makes sense, I will send a new version.
> > > 
> > > If we change the patch this way, there is no need to split off
> > > __flush_kernel_dcache_page() from __flush_dcache_page(), which also
> > > makes it simpler to apply to past stable kernels.
> > > 
> > > The only thing we need to be aware of is the change of
> > > flush_kernel_dcache_page() in 2.6.37 (commit f8b63c1).  In earlier
> > > versions, we would only need to fix the highmem case.  However, I
> > > wonder whether it makes sense to apply a fix to 2.6.32 and 2.6.34
> > > without this ever being reported as a problem.
> > 
> > imho, if you discover a problem while running kernel v3.9.x, and the
> > commit causing the problem goes all the way back to 2.6.32, then the fix
> > should go back that far as well (assuming those kernels are still
> > actively maintained).  I don't think we would need a bug report for each
> > kernel version.
> > 
> > However, some more nuanced problems are not the result of a single
> > commit, but potentially the interaction of several commits.  In this
> > case, a more conservative application may be in order.
> > 
> > Greg, any thoughts?

If a patch needs to be backported to stable kernels, and it isn't
relevant at _all_ for Linus's tree due to it being fixed some other way
that is not backportable, then I can accept the patch into the stable
releases.

Otherwise, just send me the git commit ids of the fix in Linus's tree to
the stable@vger.kernel.org mailing list please.

thanks,

greg k-h
Catalin Marinas June 5, 2013, 1:58 p.m. UTC | #27
On Mon, Jun 03, 2013 at 06:38:18PM +0100, Simon Baatz wrote:
> On Fri, May 31, 2013 at 03:15:08PM +0100, Catalin Marinas wrote:
> > On Fri, May 31, 2013 at 01:05:19PM +0100, Jason Cooper wrote:
> > > On Thu, May 30, 2013 at 05:43:35PM +0100, Catalin Marinas wrote:
> > > > On Wed, May 29, 2013 at 08:16:57PM +0100, Simon Baatz wrote:
> > > > > On Wed, May 29, 2013 at 12:05:09PM +0100, Catalin Marinas wrote:
> > > > > > On Sun, May 12, 2013 at 06:35:56AM +0100, Simon Baatz wrote:
> > > > > > > Commit f8b63c1 made flush_kernel_dcache_page a no-op assuming that the pages
> > > > > > > it needs to handle are kernel mapped only.  However, for example when doing
> > > > > > > direct I/O, pages with user space mappings may occur.
> > > > > > 
> > > > > > After Nico's clarification, I think the original commit introducing this
> > > > > > function was also incomplete (commit 73be1591 - [ARM] 5545/2: add
> > > > > > flush_kernel_dcache_page() for ARM) since it ignores highmem pages and
> > > > > > their flushing could be deferred for a long time.
> > > > > 
> > > > > Yes, I agree.
> > > > >  
> > > > > > For my understanding (if I re-read this tread) - basically code like
> > > > > > this should not leave the user mapping inconsistent:
> > > > > > 
> > > > > > kmap()
> > > > > > ...
> > > > > > flush_kernel_dcache_page()
> > > > > > kunmap()
> > ...
> > > 
> > > wrt to 73be1591, this appears to go all the way back to v2.6.31.x... Can
> > > Simon go ahead and put this in rmk's patch tracker and mention that it
> > > should go to all -stable trees?
> > 
> > I'm not sure how easy it is to apply this patch on past stable versions.
> > Maybe something simpler for stable like always flushing the cache in
> > flush_kernel_dcache_page() with optimisation only in recent stable
> > versions.
> 
> Looking at how to adapt the change to previous kernel versions, it
> occurred to me that we could probably simplify the current patch
> since we can make stronger assumptions than __flush_dcache_page()
> can do. As discussed, __flush_dcache_page() does:
> 
>                         addr = kmap_high_get(page);
>                         if (addr) {
>                                 __cpuc_flush_dcache_area(addr, PAGE_SIZE);
>                                 kunmap_high(page);
>                         }
> 
> The kmap_high_get() is used to ensure that the page is/remains pinned
> during the flush.  However, the API of flush_kernel_dcache_page()
> ensures that the page is already kmapped when it is called (as in the
> quoted example above).
> 
> Thus, it is already pinned and we can simply look at page_address(). 
> If it is NULL, the page must have been obtained via kmap_atomic() and
> we don't need to flush anyway since kunmap_atomic() will do this. 

Yes, that's the behaviour for __flush_dcache_page() already (VIVT).

> The actual flush covering both the lowmem and highmem cases actually
> is as simple as this (inspired by __flush_dcache_page() in 2.6.32):
> 
> 	addr = page_address(page);
> #ifdef CONFIG_HIGHMEM
> 	/*
> 	 * kmap_atomic() doesn't set the page virtual address, and
> 	 * kunmap_atomic() takes care of cache flushing already.
> 	 */
> 	if (addr)
> #endif
> 		__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> 
> If you agree that this makes sense, I will send a new version.

It makes sense (with the addition of the cache_is_vivt() check). Maybe
something like

	if (IS_ENABLED(CONFIG_HIGHMEM) && addr)

to avoid the #ifdef.

> If we change the patch this way, there is no need to split off
> __flush_kernel_dcache_page() from __flush_dcache_page(), which also
> makes it simpler to apply to past stable kernels.
> 
> The only thing we need to be aware of is the change of
> flush_kernel_dcache_page() in 2.6.37 (commit f8b63c1).  In earlier
> versions, we would only need to fix the highmem case.  However, I
> wonder whether it makes sense to apply a fix to 2.6.32 and 2.6.34
> without this ever being reported as a problem.

I think you can cc stable with a dependency for 2.6.37 onwards and send
a different patch for earlier versions directly to the
stable@vger.kernel.org alias, specifying the versions it needs to be
applied to.
Simon Baatz June 5, 2013, 7:55 p.m. UTC | #28
On Wed, Jun 05, 2013 at 02:58:17PM +0100, Catalin Marinas wrote:
> On Mon, Jun 03, 2013 at 06:38:18PM +0100, Simon Baatz wrote:
...
> > The actual flush covering both the lowmem and highmem cases actually
> > is as simple as this (inspired by __flush_dcache_page() in 2.6.32):
> > 
> > 	addr = page_address(page);
> > #ifdef CONFIG_HIGHMEM
> > 	/*
> > 	 * kmap_atomic() doesn't set the page virtual address, and
> > 	 * kunmap_atomic() takes care of cache flushing already.
> > 	 */
> > 	if (addr)
> > #endif
> > 		__cpuc_flush_dcache_area(addr, PAGE_SIZE);
> > 
> > If you agree that this makes sense, I will send a new version.
> 
> It makes sense (with the addition of the cache_is_vivt() check). Maybe
> something like
> 
> 	if (IS_ENABLED(CONFIG_HIGHMEM) && addr)
> 
> to avoid the #ifdef.

Yes, IS_ENABLED is much nicer. However, IS_ENABLED apparently is not
available for <= 3.0.x. :-(
 
> > If we change the patch this way, there is no need to split off
> > __flush_kernel_dcache_page() from __flush_dcache_page(), which also
> > makes it simpler to apply to past stable kernels.
> > 
> > The only thing we need to be aware of is the change of
> > flush_kernel_dcache_page() in 2.6.37 (commit f8b63c1).  In earlier
> > versions, we would only need to fix the highmem case.  However, I
> > wonder whether it makes sense to apply a fix to 2.6.32 and 2.6.34
> > without this ever being reported as a problem.
> 
> I think you can cc stable with a dependency for 2.6.37 onwards and send
> a different patch for earlier versions directly to the
> stable@vger.kernel.org alias, specifying the versions it needs to be
> applied to.

Ok. Next version of the patch then for >= 3.2.x. 3.0.x is special (#ifdef).

2.6.3[24].x are special anyway. However, for the latter I still
wonder whether this makes sense. "stable_kernel_rules.txt" states:

 - It must fix a real bug that bothers people (not a, "This could be a
   problem..." type thing).

I don't know anyone who is/was bothered by the higmem bug. 

(Btw. I verified that the highmem problem is really there (using
current linux-next) and is fixed by the proposed patch.)

- Simon
diff mbox

Patch

diff --git a/arch/arm/include/asm/cacheflush.h b/arch/arm/include/asm/cacheflush.h
index bff7138..17d0ae8 100644
--- a/arch/arm/include/asm/cacheflush.h
+++ b/arch/arm/include/asm/cacheflush.h
@@ -320,9 +320,7 @@  static inline void flush_anon_page(struct vm_area_struct *vma,
 }
 
 #define ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE
-static inline void flush_kernel_dcache_page(struct page *page)
-{
-}
+extern void flush_kernel_dcache_page(struct page *);
 
 #define flush_dcache_mmap_lock(mapping) \
 	spin_lock_irq(&(mapping)->tree_lock)
diff --git a/arch/arm/mm/flush.c b/arch/arm/mm/flush.c
index 0d473cc..485ca96 100644
--- a/arch/arm/mm/flush.c
+++ b/arch/arm/mm/flush.c
@@ -160,13 +160,13 @@  void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
 #endif
 }
 
-void __flush_dcache_page(struct address_space *mapping, struct page *page)
+/*
+ * Writeback any data associated with the kernel mapping of this
+ * page.  This ensures that data in the physical page is mutually
+ * coherent with the kernel's mapping.
+ */
+static void __flush_kernel_dcache_page(struct page *page)
 {
-	/*
-	 * Writeback any data associated with the kernel mapping of this
-	 * page.  This ensures that data in the physical page is mutually
-	 * coherent with the kernels mapping.
-	 */
 	if (!PageHighMem(page)) {
 		__cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
 	} else {
@@ -184,6 +184,11 @@  void __flush_dcache_page(struct address_space *mapping, struct page *page)
 			}
 		}
 	}
+}
+
+void __flush_dcache_page(struct address_space *mapping, struct page *page)
+{
+	__flush_kernel_dcache_page(page);
 
 	/*
 	 * If this is a page cache page, and we have an aliasing VIPT cache,
@@ -301,6 +306,27 @@  void flush_dcache_page(struct page *page)
 EXPORT_SYMBOL(flush_dcache_page);
 
 /*
+ * Ensure cache coherency for kernel mapping of this page.
+ *
+ * If the page only exists in the page cache and there are no user
+ * space mappings, this is a no-op since the page was already marked
+ * dirty at creation.  Otherwise, we need to flush the dirty kernel
+ * cache lines directly.
+ */
+void flush_kernel_dcache_page(struct page *page)
+{
+	if (cache_is_vivt() || cache_is_vipt_aliasing()) {
+		struct address_space *mapping;
+
+		mapping = page_mapping(page);
+
+		if (!mapping || mapping_mapped(mapping))
+			__flush_kernel_dcache_page(page);
+	}
+}
+EXPORT_SYMBOL(flush_kernel_dcache_page);
+
+/*
  * Flush an anonymous page so that users of get_user_pages()
  * can safely access the data.  The expected sequence is:
  *