diff mbox

[RFC,V4,6/7] arm64: mm: Enable HAVE_RCU_TABLE_FREE logic

Message ID 1396018892-6773-7-git-send-email-steve.capper@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Steve Capper March 28, 2014, 3:01 p.m. UTC
In order to implement fast_get_user_pages we need to ensure that the
page table walker is protected from page table pages being freed from
under it.

This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
to address spaces with multiple users will be call_rcu_sched freed.
Meaning that disabling interrupts will block the free and protect the
fast gup page walker.

Signed-off-by: Steve Capper <steve.capper@linaro.org>
---
 arch/arm64/Kconfig           | 1 +
 arch/arm64/include/asm/tlb.h | 8 ++++++++
 2 files changed, 9 insertions(+)

Comments

Catalin Marinas April 30, 2014, 3:20 p.m. UTC | #1
On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote:
> In order to implement fast_get_user_pages we need to ensure that the
> page table walker is protected from page table pages being freed from
> under it.
> 
> This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> to address spaces with multiple users will be call_rcu_sched freed.
> Meaning that disabling interrupts will block the free and protect the
> fast gup page walker.
> 
> Signed-off-by: Steve Capper <steve.capper@linaro.org>

While this patch is simple, I'd like to better understand the reason for
it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while
__get_user_pages_fast() is supported by a few other architectures that
don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on
arm/arm64 while not all the other archs need it?

Thanks.
Catalin Marinas April 30, 2014, 3:33 p.m. UTC | #2
On Wed, Apr 30, 2014 at 04:20:47PM +0100, Catalin Marinas wrote:
> On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote:
> > In order to implement fast_get_user_pages we need to ensure that the
> > page table walker is protected from page table pages being freed from
> > under it.
> > 
> > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> > to address spaces with multiple users will be call_rcu_sched freed.
> > Meaning that disabling interrupts will block the free and protect the
> > fast gup page walker.
> > 
> > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> 
> While this patch is simple, I'd like to better understand the reason for
> it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while
> __get_user_pages_fast() is supported by a few other architectures that
> don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on
> arm/arm64 while not all the other archs need it?

OK, replying to myself. I assume the other architectures that don't need
HAVE_RCU_TABLE_FREE use IPI for TLB shootdown, hence they gup_fast
synchronisation for free.
Steve Capper April 30, 2014, 3:38 p.m. UTC | #3
On Wed, Apr 30, 2014 at 04:33:17PM +0100, Catalin Marinas wrote:
> On Wed, Apr 30, 2014 at 04:20:47PM +0100, Catalin Marinas wrote:
> > On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote:
> > > In order to implement fast_get_user_pages we need to ensure that the
> > > page table walker is protected from page table pages being freed from
> > > under it.
> > > 
> > > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> > > to address spaces with multiple users will be call_rcu_sched freed.
> > > Meaning that disabling interrupts will block the free and protect the
> > > fast gup page walker.
> > > 
> > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > 
> > While this patch is simple, I'd like to better understand the reason for
> > it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while
> > __get_user_pages_fast() is supported by a few other architectures that
> > don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on
> > arm/arm64 while not all the other archs need it?
> 
> OK, replying to myself. I assume the other architectures that don't need
> HAVE_RCU_TABLE_FREE use IPI for TLB shootdown, hence they gup_fast
> synchronisation for free.

Hi Catalin,
Yes that is roughly the case.
Essentially we want to RCU free the page table backing pages at a
later time when we aren't walking on them.

Other arches use IPI, some others have their own RCU logic. I opted to
activate some existing logic to reduce code duplication.

Cheers,
Catalin Marinas April 30, 2014, 5:21 p.m. UTC | #4
On Wed, Apr 30, 2014 at 04:38:25PM +0100, Steve Capper wrote:
> On Wed, Apr 30, 2014 at 04:33:17PM +0100, Catalin Marinas wrote:
> > On Wed, Apr 30, 2014 at 04:20:47PM +0100, Catalin Marinas wrote:
> > > On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote:
> > > > In order to implement fast_get_user_pages we need to ensure that the
> > > > page table walker is protected from page table pages being freed from
> > > > under it.
> > > > 
> > > > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> > > > to address spaces with multiple users will be call_rcu_sched freed.
> > > > Meaning that disabling interrupts will block the free and protect the
> > > > fast gup page walker.
> > > > 
> > > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > 
> > > While this patch is simple, I'd like to better understand the reason for
> > > it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while
> > > __get_user_pages_fast() is supported by a few other architectures that
> > > don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on
> > > arm/arm64 while not all the other archs need it?
> > 
> > OK, replying to myself. I assume the other architectures that don't need
> > HAVE_RCU_TABLE_FREE use IPI for TLB shootdown, hence they gup_fast
> > synchronisation for free.
> 
> Yes that is roughly the case.
> Essentially we want to RCU free the page table backing pages at a
> later time when we aren't walking on them.
> 
> Other arches use IPI, some others have their own RCU logic. I opted to
> activate some existing logic to reduce code duplication.

Both powerpc and sparc use tlb_remove_table() via their __pte_free_tlb()
etc. which implies an IPI for synchronisation if mm_users > 1. For
gup_fast we may not need it since we use the RCU for protection. Am I
missing anything?
Steve Capper May 1, 2014, 7:34 a.m. UTC | #5
On Wed, Apr 30, 2014 at 06:21:14PM +0100, Catalin Marinas wrote:
> On Wed, Apr 30, 2014 at 04:38:25PM +0100, Steve Capper wrote:
> > On Wed, Apr 30, 2014 at 04:33:17PM +0100, Catalin Marinas wrote:
> > > On Wed, Apr 30, 2014 at 04:20:47PM +0100, Catalin Marinas wrote:
> > > > On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote:
> > > > > In order to implement fast_get_user_pages we need to ensure that the
> > > > > page table walker is protected from page table pages being freed from
> > > > > under it.
> > > > > 
> > > > > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging
> > > > > to address spaces with multiple users will be call_rcu_sched freed.
> > > > > Meaning that disabling interrupts will block the free and protect the
> > > > > fast gup page walker.
> > > > > 
> > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org>
> > > > 
> > > > While this patch is simple, I'd like to better understand the reason for
> > > > it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while
> > > > __get_user_pages_fast() is supported by a few other architectures that
> > > > don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on
> > > > arm/arm64 while not all the other archs need it?
> > > 
> > > OK, replying to myself. I assume the other architectures that don't need
> > > HAVE_RCU_TABLE_FREE use IPI for TLB shootdown, hence they gup_fast
> > > synchronisation for free.
> > 
> > Yes that is roughly the case.
> > Essentially we want to RCU free the page table backing pages at a
> > later time when we aren't walking on them.
> > 
> > Other arches use IPI, some others have their own RCU logic. I opted to
> > activate some existing logic to reduce code duplication.
> 
> Both powerpc and sparc use tlb_remove_table() via their __pte_free_tlb()
> etc. which implies an IPI for synchronisation if mm_users > 1. For
> gup_fast we may not need it since we use the RCU for protection. Am I
> missing anything?

So my understanding is:

tlb_remove_table will just immediately free any pages where there's a
single user as there's no need to consider a gup walking.

For the case of multiple users we have an mmu_table_batch structure
that holds references to pages that should be freed at a later point.

This batch is contained on a page that is allocated on the fly. If, for
any reason, we can't allocate the batch container we fallback to a slow
path which is to issue an IPI (via tlb_remove_table_one). This IPI will
block on the gup walker. We need this fallback behaviour on ARM/ARM64.

Most of the time we will be able to allocate the batch container, and
we will populate it with references to page table containing pages that
are freed via an RCU scheduler delayed callback to tlb_remove_table_rcu.

In the fast_gup walker, we block tlb_remove_table_rcu from running by
disabling interrupts in the critical path. Technically we could issue
a call to rcu_read_lock_sched instead to block tlb_remove_table_rcu,
but that wouldn't be sufficient to block THP splits; so we opt to
disable interrupts to block both THP and tlb_remove_table_rcu.

Cheers,
Catalin Marinas May 1, 2014, 9:52 a.m. UTC | #6
On Thu, May 01, 2014 at 08:34:03AM +0100, Steve Capper wrote:
> On Wed, Apr 30, 2014 at 06:21:14PM +0100, Catalin Marinas wrote:
> > Both powerpc and sparc use tlb_remove_table() via their __pte_free_tlb()
> > etc. which implies an IPI for synchronisation if mm_users > 1. For
> > gup_fast we may not need it since we use the RCU for protection. Am I
> > missing anything?
> 
> So my understanding is:
> 
> tlb_remove_table will just immediately free any pages where there's a
> single user as there's no need to consider a gup walking.

Does gup_fast walking increment the mm_users? Or is it a requirement of
the calling code? I can't seem to find where this happens.

> For the case of multiple users we have an mmu_table_batch structure
> that holds references to pages that should be freed at a later point.

Yes.

> This batch is contained on a page that is allocated on the fly. If, for
> any reason, we can't allocate the batch container we fallback to a slow
> path which is to issue an IPI (via tlb_remove_table_one). This IPI will
> block on the gup walker. We need this fallback behaviour on ARM/ARM64.

That's my main point: this batch page allocation on the fly for table
pages happens in tlb_remove_table(). With your patch for arm64
HAVE_RCU_TABLE_FREE, I can comment out tlb_remove_table() and it
compiles just fine because you don't call it from functions like
__pte_free_tlb() (as powerpc and sparc do). The __tlb_remove_page() that
we currently use doesn't give us any RCU protection here.
Peter Zijlstra May 1, 2014, 9:57 a.m. UTC | #7
On Thu, May 01, 2014 at 10:52:47AM +0100, Catalin Marinas wrote:
> Does gup_fast walking increment the mm_users? Or is it a requirement of
> the calling code? I can't seem to find where this happens.

No, its not required at all. One should only walk current->mm with
gup_fast, any other usage is broken.

And by delaying TLB shootdown, either through disabling IRQs and
stalling IPIs or by using RCU freeing, you're guaranteed your own page
tables won't disappear underneath your feet.
Catalin Marinas May 1, 2014, 10:04 a.m. UTC | #8
On Thu, May 01, 2014 at 10:57:39AM +0100, Peter Zijlstra wrote:
> On Thu, May 01, 2014 at 10:52:47AM +0100, Catalin Marinas wrote:
> > Does gup_fast walking increment the mm_users? Or is it a requirement of
> > the calling code? I can't seem to find where this happens.
> 
> No, its not required at all. One should only walk current->mm with
> gup_fast, any other usage is broken.

OK, I get it now.

> And by delaying TLB shootdown, either through disabling IRQs and
> stalling IPIs or by using RCU freeing, you're guaranteed your own page
> tables won't disappear underneath your feet.

And for RCU to work, we still need to use the full tlb_remove_table()
logic (Steve's patches just use tlb_remove_page() for table freeing).

Thanks.
Steve Capper May 1, 2014, 10:15 a.m. UTC | #9
On Thu, May 01, 2014 at 11:04:15AM +0100, Catalin Marinas wrote:
> On Thu, May 01, 2014 at 10:57:39AM +0100, Peter Zijlstra wrote:
> > On Thu, May 01, 2014 at 10:52:47AM +0100, Catalin Marinas wrote:
> > > Does gup_fast walking increment the mm_users? Or is it a requirement of
> > > the calling code? I can't seem to find where this happens.
> > 
> > No, its not required at all. One should only walk current->mm with
> > gup_fast, any other usage is broken.
> 
> OK, I get it now.
> 
> > And by delaying TLB shootdown, either through disabling IRQs and
> > stalling IPIs or by using RCU freeing, you're guaranteed your own page
> > tables won't disappear underneath your feet.
> 
> And for RCU to work, we still need to use the full tlb_remove_table()
> logic (Steve's patches just use tlb_remove_page() for table freeing).

Yes, I see.
This is a bug in the arm64 patch (arm correctly calls tlb_remove_page),
I think it got ate during a rebase.

I will fix the arm64 activation logic.

Apologies for the confustion,
diff mbox

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 27bbcfc..6185f95 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -38,6 +38,7 @@  config ARM64
 	select HAVE_MEMBLOCK
 	select HAVE_PATA_PLATFORM
 	select HAVE_PERF_EVENTS
+	select HAVE_RCU_TABLE_FREE
 	select IRQ_DOMAIN
 	select MODULES_USE_ELF_RELA
 	select NO_BOOTMEM
diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h
index 72cadf5..58a8b78 100644
--- a/arch/arm64/include/asm/tlb.h
+++ b/arch/arm64/include/asm/tlb.h
@@ -22,6 +22,14 @@ 
 
 #include <asm-generic/tlb.h>
 
+#include <linux/pagemap.h>
+#include <linux/swap.h>
+
+static inline void __tlb_remove_table(void *_table)
+{
+	free_page_and_swap_cache((struct page *)_table);
+}
+
 /*
  * There's three ways the TLB shootdown code is used:
  *  1. Unmapping a range of vmas.  See zap_page_range(), unmap_region().