Message ID | 20250226183013.GB1042@cmpxchg.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | CONFIG_PT_RECLAIM | expand |
Hi Johannes, On 2/27/25 2:30 AM, Johannes Weiner wrote: > Does PT_RECLAIM need to be configurable by the user? The PT_RECLAIM will select MMU_GATHER_RCU_TABLE_FREE, but not all archs support MMU_GATHER_RCU_TABLE_FREE, and even before Rik's a37259732a7dc ("x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditional"), x86 only supports MMU_GATHER_RCU_TABLE_FREE in the case of PARAVIRT. Therefore, PT_RECLAIM also implies the meaning of enabling MMU_GATHER_RCU_TABLE_FREE, so I made it user-configurable. And I just thought that as a new feature, it would be better to give users the ability to turn it on and off. > > Why not always try to free the page tables if the arch supports it? If so, maybe changing select MMU_GATHER_RCU_TABLE_FREE to depends on MMU_GATHER_RCU_TABLE_FREE would be better? > > diff --git a/mm/Kconfig b/mm/Kconfig > index 2761098dbc1a..99383c93db33 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -1309,16 +1309,9 @@ config ARCH_SUPPORTS_PT_RECLAIM > def_bool n > > config PT_RECLAIM > - bool "reclaim empty user page table pages" > - default y > + def_bool y > depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP > select MMU_GATHER_RCU_TABLE_FREE > - help > - Try to reclaim empty user page table pages in paths other than munmap > - and exit_mmap path. > - > - Note: now only empty user PTE page table pages will be reclaimed. > - Maybe keep the help information? Thanks, Qi > > source "mm/damon/Kconfig" >
On Thu, Feb 27, 2025 at 11:04:51AM +0800, Qi Zheng wrote: > Hi Johannes, > > On 2/27/25 2:30 AM, Johannes Weiner wrote: > > Does PT_RECLAIM need to be configurable by the user? > > The PT_RECLAIM will select MMU_GATHER_RCU_TABLE_FREE, but not all archs > support MMU_GATHER_RCU_TABLE_FREE, and even before Rik's a37259732a7dc > ("x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditional"), x86 only > supports MMU_GATHER_RCU_TABLE_FREE in the case of PARAVIRT. > > Therefore, PT_RECLAIM also implies the meaning of enabling > MMU_GATHER_RCU_TABLE_FREE, so I made it user-configurable. And I just > thought that as a new feature, it would be better to give users the > ability to turn it on and off. New *features*, yes - something that has a significant enough cost that clearly not all users want to pay for the benefits. But it's hard to imagine anybody would WANT to keep the page tables around if they madvised away all the pages inside of them. It's a great optimization, what would be a reason to opt out? > > diff --git a/mm/Kconfig b/mm/Kconfig > > index 2761098dbc1a..99383c93db33 100644 > > --- a/mm/Kconfig > > +++ b/mm/Kconfig > > @@ -1309,16 +1309,9 @@ config ARCH_SUPPORTS_PT_RECLAIM > > def_bool n > > > > config PT_RECLAIM > > - bool "reclaim empty user page table pages" > > - default y > > + def_bool y > > depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP > > select MMU_GATHER_RCU_TABLE_FREE > > - help > > - Try to reclaim empty user page table pages in paths other than munmap > > - and exit_mmap path. > > - > > - Note: now only empty user PTE page table pages will be reclaimed. > > - > > Maybe keep the help information? I don't find it very helpful :( Which "other paths?" It doesn't explain any pros and cons, and why anybody might choose to enable or disable it. The Note repeats what's in the sentence before it. Maybe I'm missing something. Could this not just be an #ifdef block inside mm/madvise.c, instead of living inside a new file with two new config symbols? #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE ... #endif Is there an arch-specific feature that it requires besides MMU_GATHER_RCU_TABLE_FREE such that only x86 supports it now? And why *does* it require MMU_GATHER_RCU_TABLE_FREE? Documentation/mm/process_addrs.rst explains why you need rcu, but there is free_pte_defer() that THP was using long before x86 needed MMU_GATHER_RCU_TABLE_FREE. It seems to me if you could use that, this feature would also work fine on architectures that do not generally need RCU for flush & frees otherwise. So is the main issue that there just isn't an explicitly deferred variant of pte_free_tlb()? If so, this is a fairly non-obvious dependency that should be documented. It would help somebody trying to port this to a !RCU mmu_gather arch. And I apologize if all this was discussed before. But if it was, the conclusions should be in the changelog or in code comments. This is a very delicate synchronization scheme that I think deserves explicit documentation somewhere.
Hi Johannes, On 2/27/25 2:08 PM, Johannes Weiner wrote: > On Thu, Feb 27, 2025 at 11:04:51AM +0800, Qi Zheng wrote: >> Hi Johannes, >> >> On 2/27/25 2:30 AM, Johannes Weiner wrote: >>> Does PT_RECLAIM need to be configurable by the user? >> >> The PT_RECLAIM will select MMU_GATHER_RCU_TABLE_FREE, but not all archs >> support MMU_GATHER_RCU_TABLE_FREE, and even before Rik's a37259732a7dc >> ("x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditional"), x86 only >> supports MMU_GATHER_RCU_TABLE_FREE in the case of PARAVIRT. >> >> Therefore, PT_RECLAIM also implies the meaning of enabling >> MMU_GATHER_RCU_TABLE_FREE, so I made it user-configurable. And I just >> thought that as a new feature, it would be better to give users the >> ability to turn it on and off. > > New *features*, yes - something that has a significant enough cost > that clearly not all users want to pay for the benefits. Got it. > > But it's hard to imagine anybody would WANT to keep the page tables > around if they madvised away all the pages inside of them. It's a > great optimization, what would be a reason to opt out? OK, now I think it makes sense to change it to 'def_bool y'. > >>> diff --git a/mm/Kconfig b/mm/Kconfig >>> index 2761098dbc1a..99383c93db33 100644 >>> --- a/mm/Kconfig >>> +++ b/mm/Kconfig >>> @@ -1309,16 +1309,9 @@ config ARCH_SUPPORTS_PT_RECLAIM >>> def_bool n >>> >>> config PT_RECLAIM >>> - bool "reclaim empty user page table pages" >>> - default y >>> + def_bool y >>> depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >>> select MMU_GATHER_RCU_TABLE_FREE >>> - help >>> - Try to reclaim empty user page table pages in paths other than munmap >>> - and exit_mmap path. >>> - >>> - Note: now only empty user PTE page table pages will be reclaimed. >>> - >> >> Maybe keep the help information? > > I don't find it very helpful :( Which "other paths?" It doesn't > explain any pros and cons, and why anybody might choose to enable or > disable it. The Note repeats what's in the sentence before it. Sorry about that. :( > > Maybe I'm missing something. Could this not just be an #ifdef block > inside mm/madvise.c, instead of living inside a new file with two new > config symbols? > > #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE > ... > #endif > > Is there an arch-specific feature that it requires besides > MMU_GATHER_RCU_TABLE_FREE such that only x86 supports it now? No, it only needs MMU_GATHER_RCU_TABLE_FREE. > > And why *does* it require MMU_GATHER_RCU_TABLE_FREE? Because in the madvise(MADV_DONTNEED) path, mmu_gather has been used to batch flush tlb and free physical pages. It is a better choice to free PTE pages in this ways as well. And because PT_RECLAIM needs rcu, we need MMU_GATHER_RCU_TABLE_FREE to make pte_free_tlb() free PTE pages through rcu. Of course, we also need to modify __tlb_remove_table_one() to make it use rcu as well. > > Documentation/mm/process_addrs.rst explains why you need rcu, but > there is free_pte_defer() that THP was using long before x86 needed > MMU_GATHER_RCU_TABLE_FREE. It seems to me if you could use that, this > feature would also work fine on architectures that do not generally > need RCU for flush & frees otherwise. So is the main issue that there As mentioned above, we want to flush & frees in batches, so we don't use pte_free_defer(). > just isn't an explicitly deferred variant of pte_free_tlb()? The pte_free_defer() seems to have been adapted to all archs, so I wonder if all archs can support MMU_GATHER_RCU_TABLE_FREE, so that pte_free_tlb() will always use rcu to free PTE pages. Maybe I missed something. +Peter. > > If so, this is a fairly non-obvious dependency that should be > documented. It would help somebody trying to port this to a !RCU > mmu_gather arch. > > And I apologize if all this was discussed before. But if it was, the > conclusions should be in the changelog or in code comments. This is a > very delicate synchronization scheme that I think deserves explicit > documentation somewhere.
On 2/27/25 2:58 PM, Qi Zheng wrote: > Hi Johannes, > > On 2/27/25 2:08 PM, Johannes Weiner wrote: >> On Thu, Feb 27, 2025 at 11:04:51AM +0800, Qi Zheng wrote: >>> Hi Johannes, >>> >>> On 2/27/25 2:30 AM, Johannes Weiner wrote: >>>> Does PT_RECLAIM need to be configurable by the user? >>> >>> The PT_RECLAIM will select MMU_GATHER_RCU_TABLE_FREE, but not all archs >>> support MMU_GATHER_RCU_TABLE_FREE, and even before Rik's a37259732a7dc >>> ("x86/mm: Make MMU_GATHER_RCU_TABLE_FREE unconditional"), x86 only >>> supports MMU_GATHER_RCU_TABLE_FREE in the case of PARAVIRT. >>> >>> Therefore, PT_RECLAIM also implies the meaning of enabling >>> MMU_GATHER_RCU_TABLE_FREE, so I made it user-configurable. And I just >>> thought that as a new feature, it would be better to give users the >>> ability to turn it on and off. >> >> New *features*, yes - something that has a significant enough cost >> that clearly not all users want to pay for the benefits. > > Got it. > >> >> But it's hard to imagine anybody would WANT to keep the page tables >> around if they madvised away all the pages inside of them. It's a >> great optimization, what would be a reason to opt out? > > OK, now I think it makes sense to change it to 'def_bool y'. > >> >>>> diff --git a/mm/Kconfig b/mm/Kconfig >>>> index 2761098dbc1a..99383c93db33 100644 >>>> --- a/mm/Kconfig >>>> +++ b/mm/Kconfig >>>> @@ -1309,16 +1309,9 @@ config ARCH_SUPPORTS_PT_RECLAIM >>>> def_bool n >>>> config PT_RECLAIM >>>> - bool "reclaim empty user page table pages" >>>> - default y >>>> + def_bool y >>>> depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP >>>> select MMU_GATHER_RCU_TABLE_FREE >>>> - help >>>> - Try to reclaim empty user page table pages in paths other >>>> than munmap >>>> - and exit_mmap path. >>>> - >>>> - Note: now only empty user PTE page table pages will be >>>> reclaimed. >>>> - >>> >>> Maybe keep the help information? >> >> I don't find it very helpful :( Which "other paths?" It doesn't >> explain any pros and cons, and why anybody might choose to enable or >> disable it. The Note repeats what's in the sentence before it. > > Sorry about that. :( > >> >> Maybe I'm missing something. Could this not just be an #ifdef block >> inside mm/madvise.c, instead of living inside a new file with two new >> config symbols? >> >> #ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE >> ... >> #endif >> >> Is there an arch-specific feature that it requires besides >> MMU_GATHER_RCU_TABLE_FREE such that only x86 supports it now? > > No, it only needs MMU_GATHER_RCU_TABLE_FREE. > And I originally planned to support more architectures. The following link has some of my previous plans: https://lore.kernel.org/all/0ca36b2e-463e-493f-aede-aff9aec3c7fa@bytedance.com/ It would be better if more people could discuss and participate. ;) Thanks! >
On 26.02.25 19:30, Johannes Weiner wrote: > Does PT_RECLAIM need to be configurable by the user? > > Why not always try to free the page tables if the arch supports it? I recall that the reasoning was that it might negatively affect some workloads so we'd be a bit careful first. I'd be happy if we'd just unconditionally enable it.
diff --git a/mm/Kconfig b/mm/Kconfig index 2761098dbc1a..99383c93db33 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -1309,16 +1309,9 @@ config ARCH_SUPPORTS_PT_RECLAIM def_bool n config PT_RECLAIM - bool "reclaim empty user page table pages" - default y + def_bool y depends on ARCH_SUPPORTS_PT_RECLAIM && MMU && SMP select MMU_GATHER_RCU_TABLE_FREE - help - Try to reclaim empty user page table pages in paths other than munmap - and exit_mmap path. - - Note: now only empty user PTE page table pages will be reclaimed. - source "mm/damon/Kconfig"