diff mbox series

[v4,4/6] mm/memory-hotplug: add finite retries in offline_pages() if migration fails

Message ID 20241107235614.3637221-5-joannelkoong@gmail.com (mailing list archive)
State New
Headers show
Series fuse: remove temp page copies in writeback | expand

Commit Message

Joanne Koong Nov. 7, 2024, 11:56 p.m. UTC
In offline_pages(), do_migrate_range() may potentially retry forever if
the migration fails. Add a return value for do_migrate_range(), and
allow offline_page() to try migrating pages 5 times before erroring
out, similar to how migration failures in __alloc_contig_migrate_range()
is handled.

Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
---
 mm/memory_hotplug.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

Comments

SeongJae Park Nov. 8, 2024, 5:33 p.m. UTC | #1
+ David Hildenbrand

On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:

> In offline_pages(), do_migrate_range() may potentially retry forever if
> the migration fails. Add a return value for do_migrate_range(), and
> allow offline_page() to try migrating pages 5 times before erroring
> out, similar to how migration failures in __alloc_contig_migrate_range()
> is handled.

I'm curious if this could cause unexpected behavioral differences to memory
hotplugging users, and how '5' is chosen.  Could you please enlighten me?

> 
> Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> ---
>  mm/memory_hotplug.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 621ae1015106..49402442ea3b 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1770,13 +1770,14 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  	return 0;
>  }
>  
> -static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> +static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)

Seems the return value is used for only knowing if it is failed or not.  If
there is no plan to use the error code in future, what about using bool return
type?

>  {
>  	struct folio *folio;
>  	unsigned long pfn;
>  	LIST_HEAD(source);
>  	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
>  				      DEFAULT_RATELIMIT_BURST);
> +	int ret = 0;
>  
>  	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
>  		struct page *page;
> @@ -1833,7 +1834,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
>  			.reason = MR_MEMORY_HOTPLUG,
>  		};
> -		int ret;
>  
>  		/*
>  		 * We have checked that migration range is on a single zone so
> @@ -1863,6 +1863,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>  			putback_movable_pages(&source);
>  		}
>  	}
> +	return ret;
>  }
>  
>  static int __init cmdline_parse_movable_node(char *p)
> @@ -1940,6 +1941,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  	const int node = zone_to_nid(zone);
>  	unsigned long flags;
>  	struct memory_notify arg;
> +	unsigned int tries = 0;
>  	char *reason;
>  	int ret;
>  
> @@ -2028,11 +2030,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
>  
>  			ret = scan_movable_pages(pfn, end_pfn, &pfn);
>  			if (!ret) {
> -				/*
> -				 * TODO: fatal migration failures should bail
> -				 * out
> -				 */
> -				do_migrate_range(pfn, end_pfn);
> +				if (do_migrate_range(pfn, end_pfn) && ++tries == 5)
> +					ret = -EBUSY;
>  			}

In the '++tries == 5' case, users will show the failure reason as "unmovable
page" from the debug log.  What about setting 'reason' here to be more
specific, e.g., "multiple migration failures"?

Also, my humble understanding of the intention of this change is as follow.  If
there are 'AS_WRITEBACK_MAY_BLOCK' pages in the migration target range,
do_migrate_range() will continuously fail.  And hence this could become
infinite loop.  Pleae let me know if I'm misunderstanding.

But if I'm not wrong...  There is a check for expected failures above
(scan_movable_pages()).  What about adding 'AS_WRITEBACK_MAY_BLOCK' pages
existence check there?

>  		} while (!ret);
>  
> -- 
> 2.43.5


Thanks,
SJ
David Hildenbrand Nov. 8, 2024, 6:56 p.m. UTC | #2
On 08.11.24 18:33, SeongJae Park wrote:
> + David Hildenbrand
> 
> On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
> 
>> In offline_pages(), do_migrate_range() may potentially retry forever if
>> the migration fails. Add a return value for do_migrate_range(), and
>> allow offline_page() to try migrating pages 5 times before erroring
>> out, similar to how migration failures in __alloc_contig_migrate_range()
>> is handled.
> 
> I'm curious if this could cause unexpected behavioral differences to memory
> hotplugging users, and how '5' is chosen.  Could you please enlighten me?
> 

I'm wondering how much more often I'll have to nack such a patch. :)

On a related note: MAINTAINERS file exists for a reason.
David Hildenbrand Nov. 8, 2024, 7 p.m. UTC | #3
On 08.11.24 19:56, David Hildenbrand wrote:
> On 08.11.24 18:33, SeongJae Park wrote:
>> + David Hildenbrand
>>
>> On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
>>
>>> In offline_pages(), do_migrate_range() may potentially retry forever if
>>> the migration fails. Add a return value for do_migrate_range(), and
>>> allow offline_page() to try migrating pages 5 times before erroring
>>> out, similar to how migration failures in __alloc_contig_migrate_range()
>>> is handled.
>>
>> I'm curious if this could cause unexpected behavioral differences to memory
>> hotplugging users, and how '5' is chosen.  Could you please enlighten me?
>>
> 
> I'm wondering how much more often I'll have to nack such a patch. :)

A more recent discussion: 
https://lore.kernel.org/linux-mm/52161997-15aa-4093-a573-3bfd8da14ff1@fujitsu.com/T/#mdda39b2956a11c46f8da8796f9612ac007edbdab

Long story short: this is expected and documented
Shakeel Butt Nov. 8, 2024, 9:27 p.m. UTC | #4
On Fri, Nov 08, 2024 at 08:00:25PM +0100, David Hildenbrand wrote:
> On 08.11.24 19:56, David Hildenbrand wrote:
> > On 08.11.24 18:33, SeongJae Park wrote:
> > > + David Hildenbrand
> > > 
> > > On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
> > > 
> > > > In offline_pages(), do_migrate_range() may potentially retry forever if
> > > > the migration fails. Add a return value for do_migrate_range(), and
> > > > allow offline_page() to try migrating pages 5 times before erroring
> > > > out, similar to how migration failures in __alloc_contig_migrate_range()
> > > > is handled.
> > > 
> > > I'm curious if this could cause unexpected behavioral differences to memory
> > > hotplugging users, and how '5' is chosen.  Could you please enlighten me?
> > > 
> > 
> > I'm wondering how much more often I'll have to nack such a patch. :)
> 
> A more recent discussion: https://lore.kernel.org/linux-mm/52161997-15aa-4093-a573-3bfd8da14ff1@fujitsu.com/T/#mdda39b2956a11c46f8da8796f9612ac007edbdab
> 
> Long story short: this is expected and documented

Thanks David for the background.

Joanne, simply drop this patch. It is not required for your series.
Joanne Koong Nov. 8, 2024, 9:42 p.m. UTC | #5
On Fri, Nov 8, 2024 at 1:27 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Fri, Nov 08, 2024 at 08:00:25PM +0100, David Hildenbrand wrote:
> > On 08.11.24 19:56, David Hildenbrand wrote:
> > > On 08.11.24 18:33, SeongJae Park wrote:
> > > > + David Hildenbrand
> > > >
> > > > On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
> > > >
> > > > > In offline_pages(), do_migrate_range() may potentially retry forever if
> > > > > the migration fails. Add a return value for do_migrate_range(), and
> > > > > allow offline_page() to try migrating pages 5 times before erroring
> > > > > out, similar to how migration failures in __alloc_contig_migrate_range()
> > > > > is handled.
> > > >
> > > > I'm curious if this could cause unexpected behavioral differences to memory
> > > > hotplugging users, and how '5' is chosen.  Could you please enlighten me?
> > > >
> > >
> > > I'm wondering how much more often I'll have to nack such a patch. :)
> >
> > A more recent discussion: https://lore.kernel.org/linux-mm/52161997-15aa-4093-a573-3bfd8da14ff1@fujitsu.com/T/#mdda39b2956a11c46f8da8796f9612ac007edbdab
> >
> > Long story short: this is expected and documented
>
> Thanks David for the background.
>
> Joanne, simply drop this patch. It is not required for your series.

Awesome, I'm happy to drop this patch.

Just curious though - don't we need this in order to mitigate the
scenario where if an unprivileged fuse server never completes
writeback, we don't run into this infinite loop? Or is it that memory
hotplugging is always initiated from userspace so if it does run into
an infinite loop (like also in that thread David linked), userspace is
responsible for sending a signal to terminate it?


Thanks,
Joanne
Joanne Koong Nov. 8, 2024, 9:59 p.m. UTC | #6
On Fri, Nov 8, 2024 at 9:33 AM SeongJae Park <sj@kernel.org> wrote:
>
> + David Hildenbrand
>
> On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
>
> > In offline_pages(), do_migrate_range() may potentially retry forever if
> > the migration fails. Add a return value for do_migrate_range(), and
> > allow offline_page() to try migrating pages 5 times before erroring
> > out, similar to how migration failures in __alloc_contig_migrate_range()
> > is handled.
>

Hi SeongJae,

Thanks for taking a look. I'm going to drop this patch per the
conversation with David and Shakeel below, but wanted to reply back to
some of the questions here for completion's sake.

> I'm curious if this could cause unexpected behavioral differences to memory
> hotplugging users, and how '5' is chosen.  Could you please enlighten me?
>

Most of this logic was copied from  __alloc_contig_migrate_range() -
in that function, '5' is hard-coded as the number of times to retry
for migration failures. No other reason for '5' other than that.

> >
> > Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
> > ---
> >  mm/memory_hotplug.c | 13 ++++++-------
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 621ae1015106..49402442ea3b 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1770,13 +1770,14 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
> >       return 0;
> >  }
> >
> > -static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> > +static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
>
> Seems the return value is used for only knowing if it is failed or not.  If
> there is no plan to use the error code in future, what about using bool return
> type?
>
> >  {
> >       struct folio *folio;
> >       unsigned long pfn;
> >       LIST_HEAD(source);
> >       static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
> >                                     DEFAULT_RATELIMIT_BURST);
> > +     int ret = 0;
> >
> >       for (pfn = start_pfn; pfn < end_pfn; pfn++) {
> >               struct page *page;
> > @@ -1833,7 +1834,6 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> >                       .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
> >                       .reason = MR_MEMORY_HOTPLUG,
> >               };
> > -             int ret;
> >
> >               /*
> >                * We have checked that migration range is on a single zone so
> > @@ -1863,6 +1863,7 @@ static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
> >                       putback_movable_pages(&source);
> >               }
> >       }
> > +     return ret;
> >  }
> >
> >  static int __init cmdline_parse_movable_node(char *p)
> > @@ -1940,6 +1941,7 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> >       const int node = zone_to_nid(zone);
> >       unsigned long flags;
> >       struct memory_notify arg;
> > +     unsigned int tries = 0;
> >       char *reason;
> >       int ret;
> >
> > @@ -2028,11 +2030,8 @@ int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
> >
> >                       ret = scan_movable_pages(pfn, end_pfn, &pfn);
> >                       if (!ret) {
> > -                             /*
> > -                              * TODO: fatal migration failures should bail
> > -                              * out
> > -                              */
> > -                             do_migrate_range(pfn, end_pfn);
> > +                             if (do_migrate_range(pfn, end_pfn) && ++tries == 5)
> > +                                     ret = -EBUSY;
> >                       }
>
> In the '++tries == 5' case, users will show the failure reason as "unmovable
> page" from the debug log.  What about setting 'reason' here to be more
> specific, e.g., "multiple migration failures"?
>
> Also, my humble understanding of the intention of this change is as follow.  If
> there are 'AS_WRITEBACK_MAY_BLOCK' pages in the migration target range,
> do_migrate_range() will continuously fail.  And hence this could become
> infinite loop.  Pleae let me know if I'm misunderstanding.
>
> But if I'm not wrong...  There is a check for expected failures above
> (scan_movable_pages()).  What about adding 'AS_WRITEBACK_MAY_BLOCK' pages
> existence check there?

The main difference between adding migrate_pages() retries (this
patch) vs adding an 'AS_WRITEBACK_MAY_BLOCK' check in
scan_movable_pages() is that in the latter, all pages in an
'AS_WRITEBACK_MAY_BLOCK' mapping will be skipped for migration whereas
in the former, only pages under writeback will be skipped. I think the
latter is probably fine too for this case but the former seemed a bit
more optimal to me.

Thanks,
Joanne


>
> >               } while (!ret);
> >
> > --
> > 2.43.5
>
>
> Thanks,
> SJ
Shakeel Butt Nov. 8, 2024, 10:16 p.m. UTC | #7
On Fri, Nov 08, 2024 at 01:42:15PM -0800, Joanne Koong wrote:
> On Fri, Nov 8, 2024 at 1:27 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Fri, Nov 08, 2024 at 08:00:25PM +0100, David Hildenbrand wrote:
> > > On 08.11.24 19:56, David Hildenbrand wrote:
> > > > On 08.11.24 18:33, SeongJae Park wrote:
> > > > > + David Hildenbrand
> > > > >
> > > > > On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > >
> > > > > > In offline_pages(), do_migrate_range() may potentially retry forever if
> > > > > > the migration fails. Add a return value for do_migrate_range(), and
> > > > > > allow offline_page() to try migrating pages 5 times before erroring
> > > > > > out, similar to how migration failures in __alloc_contig_migrate_range()
> > > > > > is handled.
> > > > >
> > > > > I'm curious if this could cause unexpected behavioral differences to memory
> > > > > hotplugging users, and how '5' is chosen.  Could you please enlighten me?
> > > > >
> > > >
> > > > I'm wondering how much more often I'll have to nack such a patch. :)
> > >
> > > A more recent discussion: https://lore.kernel.org/linux-mm/52161997-15aa-4093-a573-3bfd8da14ff1@fujitsu.com/T/#mdda39b2956a11c46f8da8796f9612ac007edbdab
> > >
> > > Long story short: this is expected and documented
> >
> > Thanks David for the background.
> >
> > Joanne, simply drop this patch. It is not required for your series.
> 
> Awesome, I'm happy to drop this patch.
> 
> Just curious though - don't we need this in order to mitigate the
> scenario where if an unprivileged fuse server never completes
> writeback, we don't run into this infinite loop? Or is it that memory
> hotplugging is always initiated from userspace so if it does run into
> an infinite loop (like also in that thread David linked), userspace is
> responsible for sending a signal to terminate it?

I think irrespective of the source of the hotplug, the current behavior
of infinite retries in some cases is documented and kind of expected, so
no need to fix it. (Though I don't know all the source of hotplug.)
Joanne Koong Nov. 8, 2024, 10:20 p.m. UTC | #8
On Fri, Nov 8, 2024 at 2:16 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
>
> On Fri, Nov 08, 2024 at 01:42:15PM -0800, Joanne Koong wrote:
> > On Fri, Nov 8, 2024 at 1:27 PM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > >
> > > On Fri, Nov 08, 2024 at 08:00:25PM +0100, David Hildenbrand wrote:
> > > > On 08.11.24 19:56, David Hildenbrand wrote:
> > > > > On 08.11.24 18:33, SeongJae Park wrote:
> > > > > > + David Hildenbrand
> > > > > >
> > > > > > On Thu, 7 Nov 2024 15:56:12 -0800 Joanne Koong <joannelkoong@gmail.com> wrote:
> > > > > >
> > > > > > > In offline_pages(), do_migrate_range() may potentially retry forever if
> > > > > > > the migration fails. Add a return value for do_migrate_range(), and
> > > > > > > allow offline_page() to try migrating pages 5 times before erroring
> > > > > > > out, similar to how migration failures in __alloc_contig_migrate_range()
> > > > > > > is handled.
> > > > > >
> > > > > > I'm curious if this could cause unexpected behavioral differences to memory
> > > > > > hotplugging users, and how '5' is chosen.  Could you please enlighten me?
> > > > > >
> > > > >
> > > > > I'm wondering how much more often I'll have to nack such a patch. :)
> > > >
> > > > A more recent discussion: https://lore.kernel.org/linux-mm/52161997-15aa-4093-a573-3bfd8da14ff1@fujitsu.com/T/#mdda39b2956a11c46f8da8796f9612ac007edbdab
> > > >
> > > > Long story short: this is expected and documented
> > >
> > > Thanks David for the background.
> > >
> > > Joanne, simply drop this patch. It is not required for your series.
> >
> > Awesome, I'm happy to drop this patch.
> >
> > Just curious though - don't we need this in order to mitigate the
> > scenario where if an unprivileged fuse server never completes
> > writeback, we don't run into this infinite loop? Or is it that memory
> > hotplugging is always initiated from userspace so if it does run into
> > an infinite loop (like also in that thread David linked), userspace is
> > responsible for sending a signal to terminate it?
>
> I think irrespective of the source of the hotplug, the current behavior
> of infinite retries in some cases is documented and kind of expected, so
> no need to fix it. (Though I don't know all the source of hotplug.)
>

Awesome, this sounds great. Thanks!
>
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 621ae1015106..49402442ea3b 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1770,13 +1770,14 @@  static int scan_movable_pages(unsigned long start, unsigned long end,
 	return 0;
 }
 
-static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
+static int do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 {
 	struct folio *folio;
 	unsigned long pfn;
 	LIST_HEAD(source);
 	static DEFINE_RATELIMIT_STATE(migrate_rs, DEFAULT_RATELIMIT_INTERVAL,
 				      DEFAULT_RATELIMIT_BURST);
+	int ret = 0;
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
 		struct page *page;
@@ -1833,7 +1834,6 @@  static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,
 			.reason = MR_MEMORY_HOTPLUG,
 		};
-		int ret;
 
 		/*
 		 * We have checked that migration range is on a single zone so
@@ -1863,6 +1863,7 @@  static void do_migrate_range(unsigned long start_pfn, unsigned long end_pfn)
 			putback_movable_pages(&source);
 		}
 	}
+	return ret;
 }
 
 static int __init cmdline_parse_movable_node(char *p)
@@ -1940,6 +1941,7 @@  int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 	const int node = zone_to_nid(zone);
 	unsigned long flags;
 	struct memory_notify arg;
+	unsigned int tries = 0;
 	char *reason;
 	int ret;
 
@@ -2028,11 +2030,8 @@  int offline_pages(unsigned long start_pfn, unsigned long nr_pages,
 
 			ret = scan_movable_pages(pfn, end_pfn, &pfn);
 			if (!ret) {
-				/*
-				 * TODO: fatal migration failures should bail
-				 * out
-				 */
-				do_migrate_range(pfn, end_pfn);
+				if (do_migrate_range(pfn, end_pfn) && ++tries == 5)
+					ret = -EBUSY;
 			}
 		} while (!ret);