Message ID | 1396018892-6773-7-git-send-email-steve.capper@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote: > In order to implement fast_get_user_pages we need to ensure that the > page table walker is protected from page table pages being freed from > under it. > > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging > to address spaces with multiple users will be call_rcu_sched freed. > Meaning that disabling interrupts will block the free and protect the > fast gup page walker. > > Signed-off-by: Steve Capper <steve.capper@linaro.org> While this patch is simple, I'd like to better understand the reason for it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while __get_user_pages_fast() is supported by a few other architectures that don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on arm/arm64 while not all the other archs need it? Thanks.
On Wed, Apr 30, 2014 at 04:20:47PM +0100, Catalin Marinas wrote: > On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote: > > In order to implement fast_get_user_pages we need to ensure that the > > page table walker is protected from page table pages being freed from > > under it. > > > > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging > > to address spaces with multiple users will be call_rcu_sched freed. > > Meaning that disabling interrupts will block the free and protect the > > fast gup page walker. > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > > While this patch is simple, I'd like to better understand the reason for > it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while > __get_user_pages_fast() is supported by a few other architectures that > don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on > arm/arm64 while not all the other archs need it? OK, replying to myself. I assume the other architectures that don't need HAVE_RCU_TABLE_FREE use IPI for TLB shootdown, hence they gup_fast synchronisation for free.
On Wed, Apr 30, 2014 at 04:33:17PM +0100, Catalin Marinas wrote: > On Wed, Apr 30, 2014 at 04:20:47PM +0100, Catalin Marinas wrote: > > On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote: > > > In order to implement fast_get_user_pages we need to ensure that the > > > page table walker is protected from page table pages being freed from > > > under it. > > > > > > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging > > > to address spaces with multiple users will be call_rcu_sched freed. > > > Meaning that disabling interrupts will block the free and protect the > > > fast gup page walker. > > > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > > > > While this patch is simple, I'd like to better understand the reason for > > it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while > > __get_user_pages_fast() is supported by a few other architectures that > > don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on > > arm/arm64 while not all the other archs need it? > > OK, replying to myself. I assume the other architectures that don't need > HAVE_RCU_TABLE_FREE use IPI for TLB shootdown, hence they gup_fast > synchronisation for free. Hi Catalin, Yes that is roughly the case. Essentially we want to RCU free the page table backing pages at a later time when we aren't walking on them. Other arches use IPI, some others have their own RCU logic. I opted to activate some existing logic to reduce code duplication. Cheers,
On Wed, Apr 30, 2014 at 04:38:25PM +0100, Steve Capper wrote: > On Wed, Apr 30, 2014 at 04:33:17PM +0100, Catalin Marinas wrote: > > On Wed, Apr 30, 2014 at 04:20:47PM +0100, Catalin Marinas wrote: > > > On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote: > > > > In order to implement fast_get_user_pages we need to ensure that the > > > > page table walker is protected from page table pages being freed from > > > > under it. > > > > > > > > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging > > > > to address spaces with multiple users will be call_rcu_sched freed. > > > > Meaning that disabling interrupts will block the free and protect the > > > > fast gup page walker. > > > > > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > > > > > > While this patch is simple, I'd like to better understand the reason for > > > it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while > > > __get_user_pages_fast() is supported by a few other architectures that > > > don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on > > > arm/arm64 while not all the other archs need it? > > > > OK, replying to myself. I assume the other architectures that don't need > > HAVE_RCU_TABLE_FREE use IPI for TLB shootdown, hence they gup_fast > > synchronisation for free. > > Yes that is roughly the case. > Essentially we want to RCU free the page table backing pages at a > later time when we aren't walking on them. > > Other arches use IPI, some others have their own RCU logic. I opted to > activate some existing logic to reduce code duplication. Both powerpc and sparc use tlb_remove_table() via their __pte_free_tlb() etc. which implies an IPI for synchronisation if mm_users > 1. For gup_fast we may not need it since we use the RCU for protection. Am I missing anything?
On Wed, Apr 30, 2014 at 06:21:14PM +0100, Catalin Marinas wrote: > On Wed, Apr 30, 2014 at 04:38:25PM +0100, Steve Capper wrote: > > On Wed, Apr 30, 2014 at 04:33:17PM +0100, Catalin Marinas wrote: > > > On Wed, Apr 30, 2014 at 04:20:47PM +0100, Catalin Marinas wrote: > > > > On Fri, Mar 28, 2014 at 03:01:31PM +0000, Steve Capper wrote: > > > > > In order to implement fast_get_user_pages we need to ensure that the > > > > > page table walker is protected from page table pages being freed from > > > > > under it. > > > > > > > > > > This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging > > > > > to address spaces with multiple users will be call_rcu_sched freed. > > > > > Meaning that disabling interrupts will block the free and protect the > > > > > fast gup page walker. > > > > > > > > > > Signed-off-by: Steve Capper <steve.capper@linaro.org> > > > > > > > > While this patch is simple, I'd like to better understand the reason for > > > > it. Currently HAVE_RCU_TABLE_FREE is enabled for powerpc and sparc while > > > > __get_user_pages_fast() is supported by a few other architectures that > > > > don't select HAVE_RCU_TABLE_FREE. So why do we need it for fast gup on > > > > arm/arm64 while not all the other archs need it? > > > > > > OK, replying to myself. I assume the other architectures that don't need > > > HAVE_RCU_TABLE_FREE use IPI for TLB shootdown, hence they gup_fast > > > synchronisation for free. > > > > Yes that is roughly the case. > > Essentially we want to RCU free the page table backing pages at a > > later time when we aren't walking on them. > > > > Other arches use IPI, some others have their own RCU logic. I opted to > > activate some existing logic to reduce code duplication. > > Both powerpc and sparc use tlb_remove_table() via their __pte_free_tlb() > etc. which implies an IPI for synchronisation if mm_users > 1. For > gup_fast we may not need it since we use the RCU for protection. Am I > missing anything? So my understanding is: tlb_remove_table will just immediately free any pages where there's a single user as there's no need to consider a gup walking. For the case of multiple users we have an mmu_table_batch structure that holds references to pages that should be freed at a later point. This batch is contained on a page that is allocated on the fly. If, for any reason, we can't allocate the batch container we fallback to a slow path which is to issue an IPI (via tlb_remove_table_one). This IPI will block on the gup walker. We need this fallback behaviour on ARM/ARM64. Most of the time we will be able to allocate the batch container, and we will populate it with references to page table containing pages that are freed via an RCU scheduler delayed callback to tlb_remove_table_rcu. In the fast_gup walker, we block tlb_remove_table_rcu from running by disabling interrupts in the critical path. Technically we could issue a call to rcu_read_lock_sched instead to block tlb_remove_table_rcu, but that wouldn't be sufficient to block THP splits; so we opt to disable interrupts to block both THP and tlb_remove_table_rcu. Cheers,
On Thu, May 01, 2014 at 08:34:03AM +0100, Steve Capper wrote: > On Wed, Apr 30, 2014 at 06:21:14PM +0100, Catalin Marinas wrote: > > Both powerpc and sparc use tlb_remove_table() via their __pte_free_tlb() > > etc. which implies an IPI for synchronisation if mm_users > 1. For > > gup_fast we may not need it since we use the RCU for protection. Am I > > missing anything? > > So my understanding is: > > tlb_remove_table will just immediately free any pages where there's a > single user as there's no need to consider a gup walking. Does gup_fast walking increment the mm_users? Or is it a requirement of the calling code? I can't seem to find where this happens. > For the case of multiple users we have an mmu_table_batch structure > that holds references to pages that should be freed at a later point. Yes. > This batch is contained on a page that is allocated on the fly. If, for > any reason, we can't allocate the batch container we fallback to a slow > path which is to issue an IPI (via tlb_remove_table_one). This IPI will > block on the gup walker. We need this fallback behaviour on ARM/ARM64. That's my main point: this batch page allocation on the fly for table pages happens in tlb_remove_table(). With your patch for arm64 HAVE_RCU_TABLE_FREE, I can comment out tlb_remove_table() and it compiles just fine because you don't call it from functions like __pte_free_tlb() (as powerpc and sparc do). The __tlb_remove_page() that we currently use doesn't give us any RCU protection here.
On Thu, May 01, 2014 at 10:52:47AM +0100, Catalin Marinas wrote: > Does gup_fast walking increment the mm_users? Or is it a requirement of > the calling code? I can't seem to find where this happens. No, its not required at all. One should only walk current->mm with gup_fast, any other usage is broken. And by delaying TLB shootdown, either through disabling IRQs and stalling IPIs or by using RCU freeing, you're guaranteed your own page tables won't disappear underneath your feet.
On Thu, May 01, 2014 at 10:57:39AM +0100, Peter Zijlstra wrote: > On Thu, May 01, 2014 at 10:52:47AM +0100, Catalin Marinas wrote: > > Does gup_fast walking increment the mm_users? Or is it a requirement of > > the calling code? I can't seem to find where this happens. > > No, its not required at all. One should only walk current->mm with > gup_fast, any other usage is broken. OK, I get it now. > And by delaying TLB shootdown, either through disabling IRQs and > stalling IPIs or by using RCU freeing, you're guaranteed your own page > tables won't disappear underneath your feet. And for RCU to work, we still need to use the full tlb_remove_table() logic (Steve's patches just use tlb_remove_page() for table freeing). Thanks.
On Thu, May 01, 2014 at 11:04:15AM +0100, Catalin Marinas wrote: > On Thu, May 01, 2014 at 10:57:39AM +0100, Peter Zijlstra wrote: > > On Thu, May 01, 2014 at 10:52:47AM +0100, Catalin Marinas wrote: > > > Does gup_fast walking increment the mm_users? Or is it a requirement of > > > the calling code? I can't seem to find where this happens. > > > > No, its not required at all. One should only walk current->mm with > > gup_fast, any other usage is broken. > > OK, I get it now. > > > And by delaying TLB shootdown, either through disabling IRQs and > > stalling IPIs or by using RCU freeing, you're guaranteed your own page > > tables won't disappear underneath your feet. > > And for RCU to work, we still need to use the full tlb_remove_table() > logic (Steve's patches just use tlb_remove_page() for table freeing). Yes, I see. This is a bug in the arm64 patch (arm correctly calls tlb_remove_page), I think it got ate during a rebase. I will fix the arm64 activation logic. Apologies for the confustion,
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 27bbcfc..6185f95 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -38,6 +38,7 @@ config ARM64 select HAVE_MEMBLOCK select HAVE_PATA_PLATFORM select HAVE_PERF_EVENTS + select HAVE_RCU_TABLE_FREE select IRQ_DOMAIN select MODULES_USE_ELF_RELA select NO_BOOTMEM diff --git a/arch/arm64/include/asm/tlb.h b/arch/arm64/include/asm/tlb.h index 72cadf5..58a8b78 100644 --- a/arch/arm64/include/asm/tlb.h +++ b/arch/arm64/include/asm/tlb.h @@ -22,6 +22,14 @@ #include <asm-generic/tlb.h> +#include <linux/pagemap.h> +#include <linux/swap.h> + +static inline void __tlb_remove_table(void *_table) +{ + free_page_and_swap_cache((struct page *)_table); +} + /* * There's three ways the TLB shootdown code is used: * 1. Unmapping a range of vmas. See zap_page_range(), unmap_region().
In order to implement fast_get_user_pages we need to ensure that the page table walker is protected from page table pages being freed from under it. This patch enables HAVE_RCU_TABLE_FREE, any page table pages belonging to address spaces with multiple users will be call_rcu_sched freed. Meaning that disabling interrupts will block the free and protect the fast gup page walker. Signed-off-by: Steve Capper <steve.capper@linaro.org> --- arch/arm64/Kconfig | 1 + arch/arm64/include/asm/tlb.h | 8 ++++++++ 2 files changed, 9 insertions(+)