diff mbox series

[v2] arm64: enable THP_SWAP for arm64

Message ID 20220527100644.293717-1-21cnbao@gmail.com (mailing list archive)
State New
Headers show
Series [v2] arm64: enable THP_SWAP for arm64 | expand

Commit Message

Barry Song May 27, 2022, 10:06 a.m. UTC
From: Barry Song <v-songbaohua@oppo.com>

THP_SWAP has been proved to improve the swap throughput significantly
on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
splitting THP after swapped out").
As long as arm64 uses 4K page size, it is quite similar with x86_64
by having 2MB PMD THP. So we are going to get similar improvement.
For other page sizes such as 16KB and 64KB, PMD might be too large.
Negative side effects such as IO latency might be a problem. Thus,
we can only safely enable the counterpart of X86_64.
A corner case is that MTE has an assumption that only base pages
can be swapped. We won't enable THP_SWP for ARM64 hardware with
MTE support until MTE is re-arched.

Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Barry Song <v-songbaohua@oppo.com>
---
 arch/arm64/Kconfig               |  1 +
 arch/arm64/include/asm/pgtable.h |  2 ++
 include/linux/huge_mm.h          | 12 ++++++++++++
 mm/swap_slots.c                  |  2 +-
 4 files changed, 16 insertions(+), 1 deletion(-)

Comments

Anshuman Khandual May 30, 2022, 7:07 a.m. UTC | #1
Hello Barry,

On 5/27/22 15:36, Barry Song wrote:
> From: Barry Song <v-songbaohua@oppo.com>
> 
> THP_SWAP has been proved to improve the swap throughput significantly
> on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
> splitting THP after swapped out").
It will be useful to run similar experiments on arm64 platform to
demonstrate tangible benefit, else we might be just enabling this
feature just because x86 has it. Do you have some data points ?

> As long as arm64 uses 4K page size, it is quite similar with x86_64
> by having 2MB PMD THP. So we are going to get similar improvement.

This is an assumption without any data points (until now). Please
do provide some results.

> For other page sizes such as 16KB and 64KB, PMD might be too large.
> Negative side effects such as IO latency might be a problem. Thus,
> we can only safely enable the counterpart of X86_64.

Incorrect reasoning. Although sometimes it might be okay to enable
a feature on platforms with possible assumptions about its benefits,
but to claim 'similar improvement, safely, .. etc' while comparing
against x86 4K page config without data points, is not very helpful.

> A corner case is that MTE has an assumption that only base pages
> can be swapped. We won't enable THP_SWP for ARM64 hardware with
> MTE support until MTE is re-arched.

re-arched ?? Did you imply that MTE is reworked to support THP ?

> 
> Cc: "Huang, Ying" <ying.huang@intel.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> ---
>  arch/arm64/Kconfig               |  1 +
>  arch/arm64/include/asm/pgtable.h |  2 ++
>  include/linux/huge_mm.h          | 12 ++++++++++++
>  mm/swap_slots.c                  |  2 +-
>  4 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> index a4968845e67f..5306009df2dc 100644
> --- a/arch/arm64/Kconfig
> +++ b/arch/arm64/Kconfig
> @@ -101,6 +101,7 @@ config ARM64
>  	select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
>  	select ARCH_WANT_LD_ORPHAN_WARN
>  	select ARCH_WANTS_NO_INSTR
> +	select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
>  	select ARCH_HAS_UBSAN_SANITIZE_ALL
>  	select ARM_AMBA
>  	select ARM_ARCH_TIMER
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 0b6632f18364..06076139c72c 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -45,6 +45,8 @@
>  	__flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> +#define arch_thp_swp_supported !system_supports_mte

Does it check for 'system_supports_mte' as a symbol or call system_supports_mte()
to ascertain runtime MTE support ? It might well be correct, but just does not
look much intuitive.

> +
>  /*
>   * Outside of a few very special situations (e.g. hibernation), we always
>   * use broadcast TLB invalidation instructions, therefore a spurious page
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index de29821231c9..4ddaf6ad73ef 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -461,4 +461,16 @@ static inline int split_folio_to_list(struct folio *folio,
>  	return split_huge_page_to_list(&folio->page, list);
>  }
>  
> +/*
> + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> + * limitations in the implementation like arm64 MTE can override this to

A small nit.

A 'comma' here will be better. s/arm64 MTE can/arm64 MTE, can/

> + * false

Similarly a 'full stop' here will be better as well.

> + */
> +#ifndef arch_thp_swp_supported
> +static inline bool arch_thp_swp_supported(void)
> +{
> +	return true;
> +}
> +#endif
> +
>  #endif /* _LINUX_HUGE_MM_H */
> diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> index 2a65a89b5b4d..10b94d64cc25 100644
> --- a/mm/swap_slots.c
> +++ b/mm/swap_slots.c
> @@ -307,7 +307,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio)
>  	entry.val = 0;
>  
>  	if (folio_test_large(folio)) {
> -		if (IS_ENABLED(CONFIG_THP_SWAP))
> +		if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
>  			get_swap_pages(1, &entry, folio_nr_pages(folio));
>  		goto out;
>  	}

- Anshuman
Barry Song May 30, 2022, 9:53 a.m. UTC | #2
On Mon, May 30, 2022 at 7:07 PM Anshuman Khandual
<anshuman.khandual@arm.com> wrote:
>
> Hello Barry,

Hi Anshuman,
thanks!

>
> On 5/27/22 15:36, Barry Song wrote:
> > From: Barry Song <v-songbaohua@oppo.com>
> >
> > THP_SWAP has been proved to improve the swap throughput significantly
> > on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
> > splitting THP after swapped out").
> It will be useful to run similar experiments on arm64 platform to
> demonstrate tangible benefit, else we might be just enabling this
> feature just because x86 has it. Do you have some data points ?
>
> > As long as arm64 uses 4K page size, it is quite similar with x86_64
> > by having 2MB PMD THP. So we are going to get similar improvement.
>
> This is an assumption without any data points (until now). Please
> do provide some results.

Fair enough though I believe THP_SWP is arch-independent. Our testing
will post data. Plus, we do need it for real use cases with some possible
out-of-tree code for this moment. so this patch does not originate only
because x86 has it :-)

>
> > For other page sizes such as 16KB and 64KB, PMD might be too large.
> > Negative side effects such as IO latency might be a problem. Thus,
> > we can only safely enable the counterpart of X86_64.
>
> Incorrect reasoning. Although sometimes it might be okay to enable
> a feature on platforms with possible assumptions about its benefits,
> but to claim 'similar improvement, safely, .. etc' while comparing
> against x86 4K page config without data points, is not very helpful.
>
> > A corner case is that MTE has an assumption that only base pages
> > can be swapped. We won't enable THP_SWP for ARM64 hardware with
> > MTE support until MTE is re-arched.
>
> re-arched ?? Did you imply that MTE is reworked to support THP ?

I think at least MTE should be able to coexist with THP_SWP though
I am not quite sure if MTE can be re-worked to fully support THP.

>
> >
> > Cc: "Huang, Ying" <ying.huang@intel.com>
> > Cc: Minchan Kim <minchan@kernel.org>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: Yang Shi <shy828301@gmail.com>
> > Signed-off-by: Barry Song <v-songbaohua@oppo.com>
> > ---
> >  arch/arm64/Kconfig               |  1 +
> >  arch/arm64/include/asm/pgtable.h |  2 ++
> >  include/linux/huge_mm.h          | 12 ++++++++++++
> >  mm/swap_slots.c                  |  2 +-
> >  4 files changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index a4968845e67f..5306009df2dc 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -101,6 +101,7 @@ config ARM64
> >       select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
> >       select ARCH_WANT_LD_ORPHAN_WARN
> >       select ARCH_WANTS_NO_INSTR
> > +     select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
> >       select ARCH_HAS_UBSAN_SANITIZE_ALL
> >       select ARM_AMBA
> >       select ARM_ARCH_TIMER
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 0b6632f18364..06076139c72c 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -45,6 +45,8 @@
> >       __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
> >  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
> >
> > +#define arch_thp_swp_supported !system_supports_mte
>
> Does it check for 'system_supports_mte' as a symbol or call system_supports_mte()
> to ascertain runtime MTE support ? It might well be correct, but just does not
> look much intuitive.

yep. looks a bit weird. but considering we only need this for arm64
and arch_thp_swp_supported
is a macro, I can't find a better way to make code modification
smaller than this in mm core, arm64
and x86. and probably we will totally remove it once we make MTE
co-exist with THP_SWP.

Do you have any suggestions for a better solution?

>
> > +
> >  /*
> >   * Outside of a few very special situations (e.g. hibernation), we always
> >   * use broadcast TLB invalidation instructions, therefore a spurious page
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index de29821231c9..4ddaf6ad73ef 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -461,4 +461,16 @@ static inline int split_folio_to_list(struct folio *folio,
> >       return split_huge_page_to_list(&folio->page, list);
> >  }
> >
> > +/*
> > + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> > + * limitations in the implementation like arm64 MTE can override this to
>
> A small nit.
>
> A 'comma' here will be better. s/arm64 MTE can/arm64 MTE, can/

yep.

>
> > + * false
>
> Similarly a 'full stop' here will be better as well.
>

yep.

> > + */
> > +#ifndef arch_thp_swp_supported
> > +static inline bool arch_thp_swp_supported(void)
> > +{
> > +     return true;
> > +}
> > +#endif
> > +
> >  #endif /* _LINUX_HUGE_MM_H */
> > diff --git a/mm/swap_slots.c b/mm/swap_slots.c
> > index 2a65a89b5b4d..10b94d64cc25 100644
> > --- a/mm/swap_slots.c
> > +++ b/mm/swap_slots.c
> > @@ -307,7 +307,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio)
> >       entry.val = 0;
> >
> >       if (folio_test_large(folio)) {
> > -             if (IS_ENABLED(CONFIG_THP_SWAP))
> > +             if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
> >                       get_swap_pages(1, &entry, folio_nr_pages(folio));
> >               goto out;
> >       }
>
> - Anshuman

Thanks
Barry
Steven Price May 30, 2022, 11:09 a.m. UTC | #3
On 30/05/2022 10:53, Barry Song wrote:
> On Mon, May 30, 2022 at 7:07 PM Anshuman Khandual
> <anshuman.khandual@arm.com> wrote:
>>
>> Hello Barry,
> 
> Hi Anshuman,
> thanks!
> 
>>
>> On 5/27/22 15:36, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> THP_SWAP has been proved to improve the swap throughput significantly
>>> on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
>>> splitting THP after swapped out").
>> It will be useful to run similar experiments on arm64 platform to
>> demonstrate tangible benefit, else we might be just enabling this
>> feature just because x86 has it. Do you have some data points ?
>>
>>> As long as arm64 uses 4K page size, it is quite similar with x86_64
>>> by having 2MB PMD THP. So we are going to get similar improvement.
>>
>> This is an assumption without any data points (until now). Please
>> do provide some results.
> 
> Fair enough though I believe THP_SWP is arch-independent. Our testing
> will post data. Plus, we do need it for real use cases with some possible
> out-of-tree code for this moment. so this patch does not originate only
> because x86 has it :-)
> 
>>
>>> For other page sizes such as 16KB and 64KB, PMD might be too large.
>>> Negative side effects such as IO latency might be a problem. Thus,
>>> we can only safely enable the counterpart of X86_64.
>>
>> Incorrect reasoning. Although sometimes it might be okay to enable
>> a feature on platforms with possible assumptions about its benefits,
>> but to claim 'similar improvement, safely, .. etc' while comparing
>> against x86 4K page config without data points, is not very helpful.
>>
>>> A corner case is that MTE has an assumption that only base pages
>>> can be swapped. We won't enable THP_SWP for ARM64 hardware with
>>> MTE support until MTE is re-arched.
>>
>> re-arched ?? Did you imply that MTE is reworked to support THP ?
> 
> I think at least MTE should be able to coexist with THP_SWP though
> I am not quite sure if MTE can be re-worked to fully support THP.

There's no fundamental reason it cannot coexist, but there are many open
areas around MTE support in general. For example at the moment swap
support keeps the tags in memory because there's no easy way to plumb
the extra tag data into the swap infrastructure.

The lazy zeroing of MTE tag storage has introduced a lot of complexity
and THP is another case where this complexity would show. It's possible
that it might make sense to take the hit of clearing tags in all pages
(i.e. make clear_page() clear the tags like mte_zero_clear_page_tags()).

>>
>>>
>>> Cc: "Huang, Ying" <ying.huang@intel.com>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Cc: Hugh Dickins <hughd@google.com>
>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>> Cc: Steven Price <steven.price@arm.com>
>>> Cc: Yang Shi <shy828301@gmail.com>
>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>> ---
>>>  arch/arm64/Kconfig               |  1 +
>>>  arch/arm64/include/asm/pgtable.h |  2 ++
>>>  include/linux/huge_mm.h          | 12 ++++++++++++
>>>  mm/swap_slots.c                  |  2 +-
>>>  4 files changed, 16 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>> index a4968845e67f..5306009df2dc 100644
>>> --- a/arch/arm64/Kconfig
>>> +++ b/arch/arm64/Kconfig
>>> @@ -101,6 +101,7 @@ config ARM64
>>>       select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
>>>       select ARCH_WANT_LD_ORPHAN_WARN
>>>       select ARCH_WANTS_NO_INSTR
>>> +     select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
>>>       select ARCH_HAS_UBSAN_SANITIZE_ALL
>>>       select ARM_AMBA
>>>       select ARM_ARCH_TIMER
>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>> index 0b6632f18364..06076139c72c 100644
>>> --- a/arch/arm64/include/asm/pgtable.h
>>> +++ b/arch/arm64/include/asm/pgtable.h
>>> @@ -45,6 +45,8 @@
>>>       __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
>>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>
>>> +#define arch_thp_swp_supported !system_supports_mte
>>
>> Does it check for 'system_supports_mte' as a symbol or call system_supports_mte()
>> to ascertain runtime MTE support ? It might well be correct, but just does not
>> look much intuitive.
> 
> yep. looks a bit weird. but considering we only need this for arm64
> and arch_thp_swp_supported
> is a macro, I can't find a better way to make code modification
> smaller than this in mm core, arm64
> and x86. and probably we will totally remove it once we make MTE
> co-exist with THP_SWP.
> 
> Do you have any suggestions for a better solution?

It would be better to write it as a function macro:

  #define arch_thp_swp_supported() (!system_supports_mte())

or you could go the whole way and introduce a static inline function
(overkill in this case IMHO):

  #define arch_thp_swp_supported
  static inline bool arch_thp_swp_supported(void)
  {
  	return !system_supports_mte();
  }

Otherwise it looks like arch_thp_swp_supported is dependent on the
symbol system_supports_mte being NULL (not on the return from the function).

Steve

>>
>>> +
>>>  /*
>>>   * Outside of a few very special situations (e.g. hibernation), we always
>>>   * use broadcast TLB invalidation instructions, therefore a spurious page
>>> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
>>> index de29821231c9..4ddaf6ad73ef 100644
>>> --- a/include/linux/huge_mm.h
>>> +++ b/include/linux/huge_mm.h
>>> @@ -461,4 +461,16 @@ static inline int split_folio_to_list(struct folio *folio,
>>>       return split_huge_page_to_list(&folio->page, list);
>>>  }
>>>
>>> +/*
>>> + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
>>> + * limitations in the implementation like arm64 MTE can override this to
>>
>> A small nit.
>>
>> A 'comma' here will be better. s/arm64 MTE can/arm64 MTE, can/
> 
> yep.
> 
>>
>>> + * false
>>
>> Similarly a 'full stop' here will be better as well.
>>
> 
> yep.
> 
>>> + */
>>> +#ifndef arch_thp_swp_supported
>>> +static inline bool arch_thp_swp_supported(void)
>>> +{
>>> +     return true;
>>> +}
>>> +#endif
>>> +
>>>  #endif /* _LINUX_HUGE_MM_H */
>>> diff --git a/mm/swap_slots.c b/mm/swap_slots.c
>>> index 2a65a89b5b4d..10b94d64cc25 100644
>>> --- a/mm/swap_slots.c
>>> +++ b/mm/swap_slots.c
>>> @@ -307,7 +307,7 @@ swp_entry_t folio_alloc_swap(struct folio *folio)
>>>       entry.val = 0;
>>>
>>>       if (folio_test_large(folio)) {
>>> -             if (IS_ENABLED(CONFIG_THP_SWAP))
>>> +             if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
>>>                       get_swap_pages(1, &entry, folio_nr_pages(folio));
>>>               goto out;
>>>       }
>>
>> - Anshuman
> 
> Thanks
> Barry
Andrew Morton May 30, 2022, 10:17 p.m. UTC | #4
On Fri, 27 May 2022 22:06:44 +1200 Barry Song <21cnbao@gmail.com> wrote:

> From: Barry Song <v-songbaohua@oppo.com>
> 
> THP_SWAP has been proved to improve the swap throughput significantly
> on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
> splitting THP after swapped out").
> As long as arm64 uses 4K page size, it is quite similar with x86_64
> by having 2MB PMD THP. So we are going to get similar improvement.
> For other page sizes such as 16KB and 64KB, PMD might be too large.
> Negative side effects such as IO latency might be a problem. Thus,
> we can only safely enable the counterpart of X86_64.
> A corner case is that MTE has an assumption that only base pages
> can be swapped. We won't enable THP_SWP for ARM64 hardware with
> MTE support until MTE is re-arched.
> 
> ...
>
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -45,6 +45,8 @@
>  	__flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>  
> +#define arch_thp_swp_supported !system_supports_mte

Does that even work?

	if (arch_thp_swp_supported())

expands to

	if (!system_supports_mte())

so I guess it does work.  Is this ugly party trick required for some
reason?  If so, an apologetic comment describing why would be helpful.

Otherwise, can we use a static inline function here, as we do with the
stub function?

> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -461,4 +461,16 @@ static inline int split_folio_to_list(struct folio *folio,
>  	return split_huge_page_to_list(&folio->page, list);
>  }
>  
> +/*
> + * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
> + * limitations in the implementation like arm64 MTE can override this to
> + * false
> + */
> +#ifndef arch_thp_swp_supported
> +static inline bool arch_thp_swp_supported(void)
> +{
> +	return true;
> +}

Missing a #define arch_thp_swp_supported arch_thp_swp_supported here.

> +#endif
> +
>  #endif /* _LINUX_HUGE_MM_H */

Otherwise looks OK to me.  Please include it in the arm64 tree if/when
it's considered ready.
Anshuman Khandual May 31, 2022, 2:32 a.m. UTC | #5
On 5/30/22 15:23, Barry Song wrote:
> On Mon, May 30, 2022 at 7:07 PM Anshuman Khandual
> <anshuman.khandual@arm.com> wrote:
>>
>> Hello Barry,
> 
> Hi Anshuman,
> thanks!
> 
>>
>> On 5/27/22 15:36, Barry Song wrote:
>>> From: Barry Song <v-songbaohua@oppo.com>
>>>
>>> THP_SWAP has been proved to improve the swap throughput significantly
>>> on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
>>> splitting THP after swapped out").
>> It will be useful to run similar experiments on arm64 platform to
>> demonstrate tangible benefit, else we might be just enabling this
>> feature just because x86 has it. Do you have some data points ?
>>
>>> As long as arm64 uses 4K page size, it is quite similar with x86_64
>>> by having 2MB PMD THP. So we are going to get similar improvement.
>>
>> This is an assumption without any data points (until now). Please
>> do provide some results.
> 
> Fair enough though I believe THP_SWP is arch-independent. Our testing
> will post data. Plus, we do need it for real use cases with some possible
> out-of-tree code for this moment. so this patch does not originate only
> because x86 has it :-)

I understand, but as you mentioned some data here will be helpful.

> 
>>
>>> For other page sizes such as 16KB and 64KB, PMD might be too large.
>>> Negative side effects such as IO latency might be a problem. Thus,
>>> we can only safely enable the counterpart of X86_64.
>>
>> Incorrect reasoning. Although sometimes it might be okay to enable
>> a feature on platforms with possible assumptions about its benefits,
>> but to claim 'similar improvement, safely, .. etc' while comparing
>> against x86 4K page config without data points, is not very helpful.
>>
>>> A corner case is that MTE has an assumption that only base pages
>>> can be swapped. We won't enable THP_SWP for ARM64 hardware with
>>> MTE support until MTE is re-arched.
>>
>> re-arched ?? Did you imply that MTE is reworked to support THP ?
> 
> I think at least MTE should be able to coexist with THP_SWP though
> I am not quite sure if MTE can be re-worked to fully support THP.

Understood but I just wanted the wording above in the commit message
to be changed to literally anything other than 're-arched'.
Anshuman Khandual May 31, 2022, 2:40 a.m. UTC | #6
On 5/30/22 16:39, Steven Price wrote:
> On 30/05/2022 10:53, Barry Song wrote:
>> On Mon, May 30, 2022 at 7:07 PM Anshuman Khandual
>> <anshuman.khandual@arm.com> wrote:
>>>
>>> Hello Barry,
>>
>> Hi Anshuman,
>> thanks!
>>
>>>
>>> On 5/27/22 15:36, Barry Song wrote:
>>>> From: Barry Song <v-songbaohua@oppo.com>
>>>>
>>>> THP_SWAP has been proved to improve the swap throughput significantly
>>>> on x86_64 according to commit bd4c82c22c367e ("mm, THP, swap: delay
>>>> splitting THP after swapped out").
>>> It will be useful to run similar experiments on arm64 platform to
>>> demonstrate tangible benefit, else we might be just enabling this
>>> feature just because x86 has it. Do you have some data points ?
>>>
>>>> As long as arm64 uses 4K page size, it is quite similar with x86_64
>>>> by having 2MB PMD THP. So we are going to get similar improvement.
>>>
>>> This is an assumption without any data points (until now). Please
>>> do provide some results.
>>
>> Fair enough though I believe THP_SWP is arch-independent. Our testing
>> will post data. Plus, we do need it for real use cases with some possible
>> out-of-tree code for this moment. so this patch does not originate only
>> because x86 has it :-)
>>
>>>
>>>> For other page sizes such as 16KB and 64KB, PMD might be too large.
>>>> Negative side effects such as IO latency might be a problem. Thus,
>>>> we can only safely enable the counterpart of X86_64.
>>>
>>> Incorrect reasoning. Although sometimes it might be okay to enable
>>> a feature on platforms with possible assumptions about its benefits,
>>> but to claim 'similar improvement, safely, .. etc' while comparing
>>> against x86 4K page config without data points, is not very helpful.
>>>
>>>> A corner case is that MTE has an assumption that only base pages
>>>> can be swapped. We won't enable THP_SWP for ARM64 hardware with
>>>> MTE support until MTE is re-arched.
>>>
>>> re-arched ?? Did you imply that MTE is reworked to support THP ?
>>
>> I think at least MTE should be able to coexist with THP_SWP though
>> I am not quite sure if MTE can be re-worked to fully support THP.
> 
> There's no fundamental reason it cannot coexist, but there are many open
> areas around MTE support in general. For example at the moment swap
> support keeps the tags in memory because there's no easy way to plumb
> the extra tag data into the swap infrastructure.
> 
> The lazy zeroing of MTE tag storage has introduced a lot of complexity
> and THP is another case where this complexity would show. It's possible
> that it might make sense to take the hit of clearing tags in all pages
> (i.e. make clear_page() clear the tags like mte_zero_clear_page_tags()).
> 
>>>
>>>>
>>>> Cc: "Huang, Ying" <ying.huang@intel.com>
>>>> Cc: Minchan Kim <minchan@kernel.org>
>>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>>> Cc: Hugh Dickins <hughd@google.com>
>>>> Cc: Andrea Arcangeli <aarcange@redhat.com>
>>>> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
>>>> Cc: Steven Price <steven.price@arm.com>
>>>> Cc: Yang Shi <shy828301@gmail.com>
>>>> Signed-off-by: Barry Song <v-songbaohua@oppo.com>
>>>> ---
>>>>  arch/arm64/Kconfig               |  1 +
>>>>  arch/arm64/include/asm/pgtable.h |  2 ++
>>>>  include/linux/huge_mm.h          | 12 ++++++++++++
>>>>  mm/swap_slots.c                  |  2 +-
>>>>  4 files changed, 16 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
>>>> index a4968845e67f..5306009df2dc 100644
>>>> --- a/arch/arm64/Kconfig
>>>> +++ b/arch/arm64/Kconfig
>>>> @@ -101,6 +101,7 @@ config ARM64
>>>>       select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
>>>>       select ARCH_WANT_LD_ORPHAN_WARN
>>>>       select ARCH_WANTS_NO_INSTR
>>>> +     select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
>>>>       select ARCH_HAS_UBSAN_SANITIZE_ALL
>>>>       select ARM_AMBA
>>>>       select ARM_ARCH_TIMER
>>>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
>>>> index 0b6632f18364..06076139c72c 100644
>>>> --- a/arch/arm64/include/asm/pgtable.h
>>>> +++ b/arch/arm64/include/asm/pgtable.h
>>>> @@ -45,6 +45,8 @@
>>>>       __flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
>>>>  #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
>>>>
>>>> +#define arch_thp_swp_supported !system_supports_mte
>>>
>>> Does it check for 'system_supports_mte' as a symbol or call system_supports_mte()
>>> to ascertain runtime MTE support ? It might well be correct, but just does not
>>> look much intuitive.
>>
>> yep. looks a bit weird. but considering we only need this for arm64
>> and arch_thp_swp_supported
>> is a macro, I can't find a better way to make code modification
>> smaller than this in mm core, arm64
>> and x86. and probably we will totally remove it once we make MTE
>> co-exist with THP_SWP.
>>
>> Do you have any suggestions for a better solution?
> 
> It would be better to write it as a function macro:
> 
>   #define arch_thp_swp_supported() (!system_supports_mte())
> 
> or you could go the whole way and introduce a static inline function
> (overkill in this case IMHO):
> 
>   #define arch_thp_swp_supported
>   static inline bool arch_thp_swp_supported(void)
>   {
>   	return !system_supports_mte();
>   }

I guess this approach is slightly better.
diff mbox series

Patch

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a4968845e67f..5306009df2dc 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -101,6 +101,7 @@  config ARM64
 	select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
 	select ARCH_WANT_LD_ORPHAN_WARN
 	select ARCH_WANTS_NO_INSTR
+	select ARCH_WANTS_THP_SWAP if ARM64_4K_PAGES
 	select ARCH_HAS_UBSAN_SANITIZE_ALL
 	select ARM_AMBA
 	select ARM_ARCH_TIMER
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 0b6632f18364..06076139c72c 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -45,6 +45,8 @@ 
 	__flush_tlb_range(vma, addr, end, PUD_SIZE, false, 1)
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE */
 
+#define arch_thp_swp_supported !system_supports_mte
+
 /*
  * Outside of a few very special situations (e.g. hibernation), we always
  * use broadcast TLB invalidation instructions, therefore a spurious page
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index de29821231c9..4ddaf6ad73ef 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -461,4 +461,16 @@  static inline int split_folio_to_list(struct folio *folio,
 	return split_huge_page_to_list(&folio->page, list);
 }
 
+/*
+ * archs that select ARCH_WANTS_THP_SWAP but don't support THP_SWP due to
+ * limitations in the implementation like arm64 MTE can override this to
+ * false
+ */
+#ifndef arch_thp_swp_supported
+static inline bool arch_thp_swp_supported(void)
+{
+	return true;
+}
+#endif
+
 #endif /* _LINUX_HUGE_MM_H */
diff --git a/mm/swap_slots.c b/mm/swap_slots.c
index 2a65a89b5b4d..10b94d64cc25 100644
--- a/mm/swap_slots.c
+++ b/mm/swap_slots.c
@@ -307,7 +307,7 @@  swp_entry_t folio_alloc_swap(struct folio *folio)
 	entry.val = 0;
 
 	if (folio_test_large(folio)) {
-		if (IS_ENABLED(CONFIG_THP_SWAP))
+		if (IS_ENABLED(CONFIG_THP_SWAP) && arch_thp_swp_supported())
 			get_swap_pages(1, &entry, folio_nr_pages(folio));
 		goto out;
 	}