mbox series

[RFC,0/6] mm: thp: use generic THP migration for NUMA hinting fault

Message ID 20210329183312.178266-1-shy828301@gmail.com (mailing list archive)
Headers show
Series mm: thp: use generic THP migration for NUMA hinting fault | expand

Message

Yang Shi March 29, 2021, 6:33 p.m. UTC
When the THP NUMA fault support was added THP migration was not supported yet.
So the ad hoc THP migration was implemented in NUMA fault handling.  Since v4.14
THP migration has been supported so it doesn't make too much sense to still keep
another THP migration implementation rather than using the generic migration
code.  It is definitely a maintenance burden to keep two THP migration
implementation for different code paths and it is more error prone.  Using the
generic THP migration implementation allows us remove the duplicate code and
some hacks needed by the old ad hoc implementation.

A quick grep shows x86_64, PowerPC (book3s), ARM64 ans S390 support both THP
and NUMA balancing.  The most of them support THP migration except for S390.
Zi Yan tried to add THP migration support for S390 before but it was not
accepted due to the design of S390 PMD.  For the discussion, please see:
https://lkml.org/lkml/2018/4/27/953.

I'm not expert on S390 so not sure if it is feasible to support THP migration
for S390 or not.  If it is not feasible then the patchset may make THP NUMA
balancing not be functional on S390.  Not sure if this is a show stopper although
the patchset does simplify the code a lot.  Anyway it seems worth posting the
series to the mailing list to get some feedback.

Patch #1 ~ #3 are preparation and clean up patches.
Patch #4 is the real meat.
Patch #5 keep THP not split if migration is failed for NUMA hinting.
Patch #6 removes a hack about page refcount.

I saw there were some hacks about gup from git history, but I didn't figure out
if they have been removed or not since I just found FOLL_NUMA code in the current
gup implementation and they seems useful.


Yang Shi (6):
      mm: memory: add orig_pmd to struct vm_fault
      mm: memory: make numa_migrate_prep() non-static
      mm: migrate: teach migrate_misplaced_page() about THP
      mm: thp: refactor NUMA fault handling
      mm: migrate: don't split THP for misplaced NUMA page
      mm: migrate: remove redundant page count check for THP

 include/linux/huge_mm.h |   9 ++---
 include/linux/migrate.h |  29 ++-------------
 include/linux/mm.h      |   1 +
 mm/huge_memory.c        | 141 +++++++++++++++++++---------------------------------------------------
 mm/internal.h           |   3 ++
 mm/memory.c             |  33 ++++++++---------
 mm/migrate.c            | 190 ++++++++++++++---------------------------------------------------------------------------------
 7 files changed, 94 insertions(+), 312 deletions(-)

Comments

Gerald Schaefer March 30, 2021, 2:42 p.m. UTC | #1
On Mon, 29 Mar 2021 11:33:06 -0700
Yang Shi <shy828301@gmail.com> wrote:

> 
> When the THP NUMA fault support was added THP migration was not supported yet.
> So the ad hoc THP migration was implemented in NUMA fault handling.  Since v4.14
> THP migration has been supported so it doesn't make too much sense to still keep
> another THP migration implementation rather than using the generic migration
> code.  It is definitely a maintenance burden to keep two THP migration
> implementation for different code paths and it is more error prone.  Using the
> generic THP migration implementation allows us remove the duplicate code and
> some hacks needed by the old ad hoc implementation.
> 
> A quick grep shows x86_64, PowerPC (book3s), ARM64 ans S390 support both THP
> and NUMA balancing.  The most of them support THP migration except for S390.
> Zi Yan tried to add THP migration support for S390 before but it was not
> accepted due to the design of S390 PMD.  For the discussion, please see:
> https://lkml.org/lkml/2018/4/27/953.
> 
> I'm not expert on S390 so not sure if it is feasible to support THP migration
> for S390 or not.  If it is not feasible then the patchset may make THP NUMA
> balancing not be functional on S390.  Not sure if this is a show stopper although
> the patchset does simplify the code a lot.  Anyway it seems worth posting the
> series to the mailing list to get some feedback.

The reason why THP migration cannot work on s390 is because the migration
code will establish swap ptes in a pmd. The pmd layout is very different from
the pte layout on s390, so you cannot simply write a swap pte into a pmd.
There are no separate swp primitives for swap/migration pmds, IIRC. And even
if there were, we'd still need to find some space for a present bit in the
s390 pmd, and/or possibly move around some other bits.

A lot of things can go wrong here, even if it could be possible in theory,
by introducing separate swp primitives in common code for pmd entries, along
with separate offset, type, shift, etc. I don't see that happening in the
near future.

Not sure if this is a show stopper, but I am not familiar enough with
NUMA and migration code to judge. E.g., I do not see any swp entry action
in your patches, but I assume this is implicitly triggered by the switch
to generic THP migration code.

Could there be a work-around by splitting THP pages instead of marking them
as migrate pmds (via pte swap entries), at least when THP migration is not
supported? I guess it could also be acceptable if THP pages were simply not
migrated for NUMA balancing on s390, but then we might need some extra config
option to make that behavior explicit.

See also my comment on patch #5 of this series.

Regards,
Gerald
Yang Shi March 30, 2021, 4:51 p.m. UTC | #2
On Tue, Mar 30, 2021 at 7:42 AM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> On Mon, 29 Mar 2021 11:33:06 -0700
> Yang Shi <shy828301@gmail.com> wrote:
>
> >
> > When the THP NUMA fault support was added THP migration was not supported yet.
> > So the ad hoc THP migration was implemented in NUMA fault handling.  Since v4.14
> > THP migration has been supported so it doesn't make too much sense to still keep
> > another THP migration implementation rather than using the generic migration
> > code.  It is definitely a maintenance burden to keep two THP migration
> > implementation for different code paths and it is more error prone.  Using the
> > generic THP migration implementation allows us remove the duplicate code and
> > some hacks needed by the old ad hoc implementation.
> >
> > A quick grep shows x86_64, PowerPC (book3s), ARM64 ans S390 support both THP
> > and NUMA balancing.  The most of them support THP migration except for S390.
> > Zi Yan tried to add THP migration support for S390 before but it was not
> > accepted due to the design of S390 PMD.  For the discussion, please see:
> > https://lkml.org/lkml/2018/4/27/953.
> >
> > I'm not expert on S390 so not sure if it is feasible to support THP migration
> > for S390 or not.  If it is not feasible then the patchset may make THP NUMA
> > balancing not be functional on S390.  Not sure if this is a show stopper although
> > the patchset does simplify the code a lot.  Anyway it seems worth posting the
> > series to the mailing list to get some feedback.
>
> The reason why THP migration cannot work on s390 is because the migration
> code will establish swap ptes in a pmd. The pmd layout is very different from
> the pte layout on s390, so you cannot simply write a swap pte into a pmd.
> There are no separate swp primitives for swap/migration pmds, IIRC. And even
> if there were, we'd still need to find some space for a present bit in the
> s390 pmd, and/or possibly move around some other bits.
>
> A lot of things can go wrong here, even if it could be possible in theory,
> by introducing separate swp primitives in common code for pmd entries, along
> with separate offset, type, shift, etc. I don't see that happening in the
> near future.

Thanks a lot for elaboration. IIUC, implementing migration PMD entry
is *not* prevented from by hardware, it may be very tricky to
implement it, right?

>
> Not sure if this is a show stopper, but I am not familiar enough with
> NUMA and migration code to judge. E.g., I do not see any swp entry action
> in your patches, but I assume this is implicitly triggered by the switch
> to generic THP migration code.

Yes, exactly. The migrate_pages() called by migrate_misplaced_page()
takes care of everything.

>
> Could there be a work-around by splitting THP pages instead of marking them
> as migrate pmds (via pte swap entries), at least when THP migration is not
> supported? I guess it could also be acceptable if THP pages were simply not
> migrated for NUMA balancing on s390, but then we might need some extra config
> option to make that behavior explicit.

Yes, it could be. The old behavior of migration was to return -ENOMEM
if THP migration is not supported then split THP. That behavior was
not very friendly to some usecases, for example, memory policy and
migration lieu of reclaim (the upcoming). But I don't mean we restore
the old behavior. We could split THP if it returns -ENOSYS and the
page is THP.

>
> See also my comment on patch #5 of this series.
>
> Regards,
> Gerald
Gerald Schaefer March 31, 2021, 11:47 a.m. UTC | #3
On Tue, 30 Mar 2021 09:51:46 -0700
Yang Shi <shy828301@gmail.com> wrote:

> On Tue, Mar 30, 2021 at 7:42 AM Gerald Schaefer
> <gerald.schaefer@linux.ibm.com> wrote:
> >
> > On Mon, 29 Mar 2021 11:33:06 -0700
> > Yang Shi <shy828301@gmail.com> wrote:
> >  
> > >
> > > When the THP NUMA fault support was added THP migration was not supported yet.
> > > So the ad hoc THP migration was implemented in NUMA fault handling.  Since v4.14
> > > THP migration has been supported so it doesn't make too much sense to still keep
> > > another THP migration implementation rather than using the generic migration
> > > code.  It is definitely a maintenance burden to keep two THP migration
> > > implementation for different code paths and it is more error prone.  Using the
> > > generic THP migration implementation allows us remove the duplicate code and
> > > some hacks needed by the old ad hoc implementation.
> > >
> > > A quick grep shows x86_64, PowerPC (book3s), ARM64 ans S390 support both THP
> > > and NUMA balancing.  The most of them support THP migration except for S390.
> > > Zi Yan tried to add THP migration support for S390 before but it was not
> > > accepted due to the design of S390 PMD.  For the discussion, please see:
> > > https://lkml.org/lkml/2018/4/27/953.
> > >
> > > I'm not expert on S390 so not sure if it is feasible to support THP migration
> > > for S390 or not.  If it is not feasible then the patchset may make THP NUMA
> > > balancing not be functional on S390.  Not sure if this is a show stopper although
> > > the patchset does simplify the code a lot.  Anyway it seems worth posting the
> > > series to the mailing list to get some feedback.  
> >
> > The reason why THP migration cannot work on s390 is because the migration
> > code will establish swap ptes in a pmd. The pmd layout is very different from
> > the pte layout on s390, so you cannot simply write a swap pte into a pmd.
> > There are no separate swp primitives for swap/migration pmds, IIRC. And even
> > if there were, we'd still need to find some space for a present bit in the
> > s390 pmd, and/or possibly move around some other bits.
> >
> > A lot of things can go wrong here, even if it could be possible in theory,
> > by introducing separate swp primitives in common code for pmd entries, along
> > with separate offset, type, shift, etc. I don't see that happening in the
> > near future.  
> 
> Thanks a lot for elaboration. IIUC, implementing migration PMD entry
> is *not* prevented from by hardware, it may be very tricky to
> implement it, right?

Well, it depends. The HW is preventing proper full-blown swap + migration
support for PMD, similar to what we have for PTE, because we simply don't
have enough OS-defined bits in the PMD. A 5-bit swap type for example,
similar to a PTE, plus the PFN would not be possible.

The HW would not prevent a similar mechanism in principle, i.e. we could
mark it as invalid to trigger a fault, and have some magic bits that tell
the fault handler or migration code what it is about.

For handling migration aspects only, w/o any swap device or other support, a
single type bit could already be enough, to indicate read/write migration,
plus a "present" bit similar to PTE. But even those 2 bits would be hard to
find, though I would not entirely rule that out. That would be the tricky
part.

Then of course, common code would need some changes, to reflect the
different swap/migration (type) capabilities of PTE and PMD entries.
Not sure if such an approach would be acceptable for common code.

But this is just some very abstract and optimistic view, I have not
really properly looked into the details. So it might be even more
tricky, or not possible at all.

> 
> >
> > Not sure if this is a show stopper, but I am not familiar enough with
> > NUMA and migration code to judge. E.g., I do not see any swp entry action
> > in your patches, but I assume this is implicitly triggered by the switch
> > to generic THP migration code.  
> 
> Yes, exactly. The migrate_pages() called by migrate_misplaced_page()
> takes care of everything.
> 
> >
> > Could there be a work-around by splitting THP pages instead of marking them
> > as migrate pmds (via pte swap entries), at least when THP migration is not
> > supported? I guess it could also be acceptable if THP pages were simply not
> > migrated for NUMA balancing on s390, but then we might need some extra config
> > option to make that behavior explicit.  
> 
> Yes, it could be. The old behavior of migration was to return -ENOMEM
> if THP migration is not supported then split THP. That behavior was
> not very friendly to some usecases, for example, memory policy and
> migration lieu of reclaim (the upcoming). But I don't mean we restore
> the old behavior. We could split THP if it returns -ENOSYS and the
> page is THP.

OK, as long as we don't get any broken PMD migration entries established
for s390, some extra THP splitting would be acceptable I guess.
Mel Gorman March 31, 2021, 1:20 p.m. UTC | #4
On Tue, Mar 30, 2021 at 04:42:00PM +0200, Gerald Schaefer wrote:
> Could there be a work-around by splitting THP pages instead of marking them
> as migrate pmds (via pte swap entries), at least when THP migration is not
> supported? I guess it could also be acceptable if THP pages were simply not
> migrated for NUMA balancing on s390, but then we might need some extra config
> option to make that behavior explicit.
> 

The split is not done on other architectures simply because the loss
from splitting exceeded the gain of improved locality in too many cases.
However, it might be ok as an s390-specific workaround.

(Note, I haven't read the rest of the series due to lack of time but this
query caught my eye).
Yang Shi April 1, 2021, 8:10 p.m. UTC | #5
On Wed, Mar 31, 2021 at 4:47 AM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> On Tue, 30 Mar 2021 09:51:46 -0700
> Yang Shi <shy828301@gmail.com> wrote:
>
> > On Tue, Mar 30, 2021 at 7:42 AM Gerald Schaefer
> > <gerald.schaefer@linux.ibm.com> wrote:
> > >
> > > On Mon, 29 Mar 2021 11:33:06 -0700
> > > Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > >
> > > > When the THP NUMA fault support was added THP migration was not supported yet.
> > > > So the ad hoc THP migration was implemented in NUMA fault handling.  Since v4.14
> > > > THP migration has been supported so it doesn't make too much sense to still keep
> > > > another THP migration implementation rather than using the generic migration
> > > > code.  It is definitely a maintenance burden to keep two THP migration
> > > > implementation for different code paths and it is more error prone.  Using the
> > > > generic THP migration implementation allows us remove the duplicate code and
> > > > some hacks needed by the old ad hoc implementation.
> > > >
> > > > A quick grep shows x86_64, PowerPC (book3s), ARM64 ans S390 support both THP
> > > > and NUMA balancing.  The most of them support THP migration except for S390.
> > > > Zi Yan tried to add THP migration support for S390 before but it was not
> > > > accepted due to the design of S390 PMD.  For the discussion, please see:
> > > > https://lkml.org/lkml/2018/4/27/953.
> > > >
> > > > I'm not expert on S390 so not sure if it is feasible to support THP migration
> > > > for S390 or not.  If it is not feasible then the patchset may make THP NUMA
> > > > balancing not be functional on S390.  Not sure if this is a show stopper although
> > > > the patchset does simplify the code a lot.  Anyway it seems worth posting the
> > > > series to the mailing list to get some feedback.
> > >
> > > The reason why THP migration cannot work on s390 is because the migration
> > > code will establish swap ptes in a pmd. The pmd layout is very different from
> > > the pte layout on s390, so you cannot simply write a swap pte into a pmd.
> > > There are no separate swp primitives for swap/migration pmds, IIRC. And even
> > > if there were, we'd still need to find some space for a present bit in the
> > > s390 pmd, and/or possibly move around some other bits.
> > >
> > > A lot of things can go wrong here, even if it could be possible in theory,
> > > by introducing separate swp primitives in common code for pmd entries, along
> > > with separate offset, type, shift, etc. I don't see that happening in the
> > > near future.
> >
> > Thanks a lot for elaboration. IIUC, implementing migration PMD entry
> > is *not* prevented from by hardware, it may be very tricky to
> > implement it, right?
>
> Well, it depends. The HW is preventing proper full-blown swap + migration
> support for PMD, similar to what we have for PTE, because we simply don't
> have enough OS-defined bits in the PMD. A 5-bit swap type for example,
> similar to a PTE, plus the PFN would not be possible.
>
> The HW would not prevent a similar mechanism in principle, i.e. we could
> mark it as invalid to trigger a fault, and have some magic bits that tell
> the fault handler or migration code what it is about.
>
> For handling migration aspects only, w/o any swap device or other support, a
> single type bit could already be enough, to indicate read/write migration,
> plus a "present" bit similar to PTE. But even those 2 bits would be hard to
> find, though I would not entirely rule that out. That would be the tricky
> part.
>
> Then of course, common code would need some changes, to reflect the
> different swap/migration (type) capabilities of PTE and PMD entries.
> Not sure if such an approach would be acceptable for common code.
>
> But this is just some very abstract and optimistic view, I have not
> really properly looked into the details. So it might be even more
> tricky, or not possible at all.

Thanks a lot for the elaboration.

>
> >
> > >
> > > Not sure if this is a show stopper, but I am not familiar enough with
> > > NUMA and migration code to judge. E.g., I do not see any swp entry action
> > > in your patches, but I assume this is implicitly triggered by the switch
> > > to generic THP migration code.
> >
> > Yes, exactly. The migrate_pages() called by migrate_misplaced_page()
> > takes care of everything.
> >
> > >
> > > Could there be a work-around by splitting THP pages instead of marking them
> > > as migrate pmds (via pte swap entries), at least when THP migration is not
> > > supported? I guess it could also be acceptable if THP pages were simply not
> > > migrated for NUMA balancing on s390, but then we might need some extra config
> > > option to make that behavior explicit.
> >
> > Yes, it could be. The old behavior of migration was to return -ENOMEM
> > if THP migration is not supported then split THP. That behavior was
> > not very friendly to some usecases, for example, memory policy and
> > migration lieu of reclaim (the upcoming). But I don't mean we restore
> > the old behavior. We could split THP if it returns -ENOSYS and the
> > page is THP.
>
> OK, as long as we don't get any broken PMD migration entries established
> for s390, some extra THP splitting would be acceptable I guess.

There will be no migration PMD installed. The current behavior is a
no-op if THP migration is not supported.
Yang Shi April 1, 2021, 8:12 p.m. UTC | #6
On Wed, Mar 31, 2021 at 6:20 AM Mel Gorman <mgorman@suse.de> wrote:
>
> On Tue, Mar 30, 2021 at 04:42:00PM +0200, Gerald Schaefer wrote:
> > Could there be a work-around by splitting THP pages instead of marking them
> > as migrate pmds (via pte swap entries), at least when THP migration is not
> > supported? I guess it could also be acceptable if THP pages were simply not
> > migrated for NUMA balancing on s390, but then we might need some extra config
> > option to make that behavior explicit.
> >
>
> The split is not done on other architectures simply because the loss
> from splitting exceeded the gain of improved locality in too many cases.
> However, it might be ok as an s390-specific workaround.
>
> (Note, I haven't read the rest of the series due to lack of time but this
> query caught my eye).

Will wait for your comments before I post v2. Thanks.

>
> --
> Mel Gorman
> SUSE Labs
Gerald Schaefer April 6, 2021, 12:02 p.m. UTC | #7
On Thu, 1 Apr 2021 13:10:49 -0700
Yang Shi <shy828301@gmail.com> wrote:

[...]
> > >
> > > Yes, it could be. The old behavior of migration was to return -ENOMEM
> > > if THP migration is not supported then split THP. That behavior was
> > > not very friendly to some usecases, for example, memory policy and
> > > migration lieu of reclaim (the upcoming). But I don't mean we restore
> > > the old behavior. We could split THP if it returns -ENOSYS and the
> > > page is THP.
> >
> > OK, as long as we don't get any broken PMD migration entries established
> > for s390, some extra THP splitting would be acceptable I guess.
> 
> There will be no migration PMD installed. The current behavior is a
> no-op if THP migration is not supported.

Ok, just for completeness, since Mel also replied that the split
was not done on other architectures "because the loss from splitting
exceeded the gain of improved locality":

I did not mean to request extra splitting functionality for s390,
simply skipping / ignoring large PMDs would also be fine for s390,
no need to add extra complexity.
Yang Shi April 6, 2021, 4:42 p.m. UTC | #8
On Tue, Apr 6, 2021 at 5:03 AM Gerald Schaefer
<gerald.schaefer@linux.ibm.com> wrote:
>
> On Thu, 1 Apr 2021 13:10:49 -0700
> Yang Shi <shy828301@gmail.com> wrote:
>
> [...]
> > > >
> > > > Yes, it could be. The old behavior of migration was to return -ENOMEM
> > > > if THP migration is not supported then split THP. That behavior was
> > > > not very friendly to some usecases, for example, memory policy and
> > > > migration lieu of reclaim (the upcoming). But I don't mean we restore
> > > > the old behavior. We could split THP if it returns -ENOSYS and the
> > > > page is THP.
> > >
> > > OK, as long as we don't get any broken PMD migration entries established
> > > for s390, some extra THP splitting would be acceptable I guess.
> >
> > There will be no migration PMD installed. The current behavior is a
> > no-op if THP migration is not supported.
>
> Ok, just for completeness, since Mel also replied that the split
> was not done on other architectures "because the loss from splitting
> exceeded the gain of improved locality":
>
> I did not mean to request extra splitting functionality for s390,
> simply skipping / ignoring large PMDs would also be fine for s390,
> no need to add extra complexity.

Thank you. It could make life easier. The current code still converts
huge PMD to RPOTNONE even though THP migration is not supported. It is
easy to skip such PMDs hence cycles are saved for pointless NUMA
hinting page faults.

Will do so in v2 if no objection from Mel as well.
Mel Gorman April 7, 2021, 8:32 a.m. UTC | #9
On Tue, Apr 06, 2021 at 09:42:07AM -0700, Yang Shi wrote:
> On Tue, Apr 6, 2021 at 5:03 AM Gerald Schaefer
> <gerald.schaefer@linux.ibm.com> wrote:
> >
> > On Thu, 1 Apr 2021 13:10:49 -0700
> > Yang Shi <shy828301@gmail.com> wrote:
> >
> > [...]
> > > > >
> > > > > Yes, it could be. The old behavior of migration was to return -ENOMEM
> > > > > if THP migration is not supported then split THP. That behavior was
> > > > > not very friendly to some usecases, for example, memory policy and
> > > > > migration lieu of reclaim (the upcoming). But I don't mean we restore
> > > > > the old behavior. We could split THP if it returns -ENOSYS and the
> > > > > page is THP.
> > > >
> > > > OK, as long as we don't get any broken PMD migration entries established
> > > > for s390, some extra THP splitting would be acceptable I guess.
> > >
> > > There will be no migration PMD installed. The current behavior is a
> > > no-op if THP migration is not supported.
> >
> > Ok, just for completeness, since Mel also replied that the split
> > was not done on other architectures "because the loss from splitting
> > exceeded the gain of improved locality":
> >
> > I did not mean to request extra splitting functionality for s390,
> > simply skipping / ignoring large PMDs would also be fine for s390,
> > no need to add extra complexity.
> 
> Thank you. It could make life easier. The current code still converts
> huge PMD to RPOTNONE even though THP migration is not supported. It is
> easy to skip such PMDs hence cycles are saved for pointless NUMA
> hinting page faults.
> 
> Will do so in v2 if no objection from Mel as well.

I did not get a chance to review this in time but if a v2 shows up,
I'll at least run it through a battery of tests to measure the impact
and hopefully find the time to do a proper review. Superficially I'm not
opposed to using generic code for migration because even if it shows up a
problem, it would be better to optimise the generic implementation than
carry two similar implementations. I'm undecided on whether s390 should
split+migrate rather than skip because I do not have a good overview of
"typical workloads on s390 that benefit from NUMA balancing".
Yang Shi April 7, 2021, 4:04 p.m. UTC | #10
On Wed, Apr 7, 2021 at 1:32 AM Mel Gorman <mgorman@suse.de> wrote:
>
> On Tue, Apr 06, 2021 at 09:42:07AM -0700, Yang Shi wrote:
> > On Tue, Apr 6, 2021 at 5:03 AM Gerald Schaefer
> > <gerald.schaefer@linux.ibm.com> wrote:
> > >
> > > On Thu, 1 Apr 2021 13:10:49 -0700
> > > Yang Shi <shy828301@gmail.com> wrote:
> > >
> > > [...]
> > > > > >
> > > > > > Yes, it could be. The old behavior of migration was to return -ENOMEM
> > > > > > if THP migration is not supported then split THP. That behavior was
> > > > > > not very friendly to some usecases, for example, memory policy and
> > > > > > migration lieu of reclaim (the upcoming). But I don't mean we restore
> > > > > > the old behavior. We could split THP if it returns -ENOSYS and the
> > > > > > page is THP.
> > > > >
> > > > > OK, as long as we don't get any broken PMD migration entries established
> > > > > for s390, some extra THP splitting would be acceptable I guess.
> > > >
> > > > There will be no migration PMD installed. The current behavior is a
> > > > no-op if THP migration is not supported.
> > >
> > > Ok, just for completeness, since Mel also replied that the split
> > > was not done on other architectures "because the loss from splitting
> > > exceeded the gain of improved locality":
> > >
> > > I did not mean to request extra splitting functionality for s390,
> > > simply skipping / ignoring large PMDs would also be fine for s390,
> > > no need to add extra complexity.
> >
> > Thank you. It could make life easier. The current code still converts
> > huge PMD to RPOTNONE even though THP migration is not supported. It is
> > easy to skip such PMDs hence cycles are saved for pointless NUMA
> > hinting page faults.
> >
> > Will do so in v2 if no objection from Mel as well.
>
> I did not get a chance to review this in time but if a v2 shows up,
> I'll at least run it through a battery of tests to measure the impact
> and hopefully find the time to do a proper review. Superficially I'm not
> opposed to using generic code for migration because even if it shows up a
> problem, it would be better to optimise the generic implementation than
> carry two similar implementations. I'm undecided on whether s390 should
> split+migrate rather than skip because I do not have a good overview of
> "typical workloads on s390 that benefit from NUMA balancing".

Thanks, Mel. I don't have an idea about S390 either. I will just skip
huge PMDs for S390 for now as Gerald suggested.

>
> --
> Mel Gorman
> SUSE Labs